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Abstract 

We consider a framework for the construction of iterative schemes for operator 
equations that combine low-rank approximation in tensor formats and adaptive ap- 
proximation in a basis. Under fairly general assumptions, we obtain a rigorous con- 
vergence analysis, where all parameters required for the execution of the methods 
depend only on the underlying infinite-dimensional problem, but not on a concrete 
discretization. Under certain assumptions on the rates for the involved low-rank ap- 
proximations and basis expansions, we can also give bounds on the computational 
complexity of the iteration as a function of the prescribed target error. Our theo- 
retical findings are illustrated and supported by computational experiments. These 
demonstrate that problems in very high dimensions can be treated with controlled 
solution accuracy. 

Keywords: Low-rank tensor approximation, adaptive methods, high-dimensional 
operator equations, computational complexity 

Mathematics Subject Classification (2000): 41A46, 41A63, 65D99, 65J10, 
65N12, 65N15 

1 Introduction 

1.1 Motivation 

Any attempt to recover or approximate a function of a large number of variables with 
the aid of classical low-dimensional techniques is inevitably impeded by the curse of di- 
mensionality. This means that, when only assuming classical smoothness (e.g. in terms 
of Sobolev or Besov regularity) of order s > 0, the necessary computational work needed 
to realize a desired target accuracy e in d dimensions scales like e~ d ' s , i.e., one faces an 
exponential increase in the spatial dimension d. This can be ameliorated by dimension- 
dependent smoothness measures. In many high-dimensional problems of interest, the ap- 
proximand has bounded high-order mixed derivatives, which under suitable assumptions 
can be used to construct sparse grid-type approximations where the computational work 
scales like C^ 1 ' 8 '. Under such regularity assumptions, one can thus obtain a convergence 
rate independent of d. In general, however, the constant Cd will still grow exponentially in 
d. This has been shown to hold even under extremely restrictive smoothness assumptions 
in |27| , and has been observed numerically in a relatively simple but realistic example 
in |12|. 
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Hence, in contrast to the low-dimensional regime, regularity is no longer a sufficient 
structural property that ensures computational feasibility, and further low-dimensional 
structure of the sought high-dimensional object is required. Such a structure could be 
the dependence of the function on a much smaller (unknown) number of variables, see 
e.g. |11|. It could also mean sparsity with respect to some (a priori) unknown dictionary. In 
particular, dictionaries comprized of rank-one tensors g{x\, . . . , Xd) = gi(%i) ■ ■ ■ 9d(xd) ='■ 
(gi <8> • • • < ^ i gd}{x) open very promising perspectives and have recently attracted substantial 
attention. 

As a simple example consider g(x) = (S>i=i9i( x i) on the urn t cube O = [0, l] d , where 
the gi are sufficiently smooth. Employing for each factor gi a standard spline approxi- 
mation of order s with n knots yields an Loo-accuracy of order n~ s , which gives rise to 
an overall accuracy of the order of dn~ s at the expense of dn =: N degrees of freedom. 
Hence, assuming that \\g\\oo does not depend on d, an accuracy e requires 

N = N(e,d) ~d^e~ 1/s (1) 

degrees of freedom. In contrast, it would take the order oi N = n degrees of freedom to 
realize an accuracy of order n~ s = N~ d ' s when using a standard tensor product spline 
approximation, which means that in this case N(e,d) ~ e ' s . Thus, while the first 
approximation - using a nonlinear parametrization of a reference basis - breaks the curse 
of dimensionality, the second one obviously does not. 

Of course, u being a simple tensor is in general an unrealistic assumption, but the 
curse of dimensionality can still be significantly mitigated when / is well approximable by 
relatively short sums of rank-one tensors. By this we mean that for some norm || • || we 
have 

r(e) 






<e (2) 



where the rank r(e) grows only moderately as e decreases. In our initial example, in 
these terms we had r(e) = 1 for all e > 0. Assuming that all the factors gj t % in the above 
approximation are sufficiently smooth, the count (fTl) applied to each summand with target 
accuracy e/r shows that now at most 

N{e,d,r)<r x+ $drre-h (3) 

degrees of freedom are required, which is still acceptable. This is clearly a very crude 
reasoning because it does not take a possible additional decay in the rank-one summands 
into account. 

This argument, however, already indicates that good approximability in the sense of 
([2]) is not governed by classical regularity assumptions. Instead, the key is to exploit an 
approximate global low-rank structure of u. This leads to a highly nonlinear approximation 
problem, where one aims to identify suitable lower-dimensional tensor factors, which can 
be interpreted as a u-dependent dictionary. 

This discussion, although admittedly somewhat oversimplified, immediately raises sev- 
eral questions which we will briefly discuss as they guide subsequent developments. 

Format of approximation: The hope that r(e) in (pi) can be rather small is based on 
the fact that the rank-one tensors are allowed to "optimally adapt" to the approximand 
u. The format of the approximation used in ([2]) is sometimes called canonical since it is 
a formal direct generalization of classical Hilbert Schmidt expansions for d = 2. However, 
a closer look reveals a number of well-known pitfalls. In fact, they are already encoun- 
tered in the discrete case. The collection of sums of ranks one tensors of a given length 



is not closed, and the best approximation problem is not well-posed, see e.g. 10 . There 
appears to be no reliable computational strategy that can be proven to yield near-minimal 
rank approximations for a given target accuracy in this format. In this work, we there- 
fore employ different tensor formats that allow us to obtain provably near-minimal rank 
approximations, as explained later. 

A two-layered problem: Given a suitable tensor format, even if a best tensor approxi- 
mation is known in the infinite-dimensional setting of the continous problem, the resulting 
lower- dimensional factors still need to be approximated. Since finding these factors is part 
of the solution process, the determination of efficient discretizations for these factors will 
need to be intertwined with the process of finding low-rank expansions. We have chosen 
here to organize this process through selecting low-dimensional orthonormal wavelet bases 
for the tensor factors. However, other types of basis expansions would be conceivable as 
well. 

The issue of the total complexity of tensor approximations, taking the approximation 



of the involved lower-dimensional factors into account, is addressed in 17,31 



1.2 Conceptual Preview 

The problem of finding a suitable format of tensor approximations has been extensively 
studied in the literature over that past years, however, mainly in the discrete or finite- 



dimensional setting, see e.g. 15, 20 22 28, 30 . Some further aspects in a function space 



setting have been addressed e.g. in |13l35,36|. For an overview and further references we 



also refer to [IS] and the recent survey 16 . A central question in these works is: given a 
tensor, how can one in a stable manner obtain low-rank approximations, and how accurate 
are they when compared with best tensor approximations in the respective format? 

We shall heavily draw on these findings in the present paper, but under the following 
somewhat different perspectives. First of all, we are interested in the continuous infinite- 
dimensional setting, i.e., in sparse tensor approximations of a function which is a priori 
not given in any finite tensor format but which one may expect to be well approximable by 
simple tensors in a way to be made precise later. We shall not discuss here the question 
under which concrete conditions this is actually the case. Moreover, the objects to be 
recovered are not given explicitly but only implicitly as a solution to an operator equation 

An = f, (4) 

where A : V — > V is an isomorphism of some Hilbert space V onto its dual V . One may 
think of V, in the simplest instance, as a high-dimensional L2 space, or as a Sobolev space. 
More generally, as in the context of parametric diffusion problems, V could be a tensor 
product of a Sobolev space and an L2 space. Accordingly, we shall always assume that we 
have a Gelfand triplet 

V C H = H' C V, (5) 

in the sense of dense continuous embeddings, where we assume that H is a tensor product 
Hilbert space, that is, 

H = H l ®---®H d (6) 

with lower-dimensional Hilbert spaces Hi. A typical example would be H = L^O ) = 
L2(r2) ® • • • (8) L2(r2) for a domain Q of small spatial dimension. 

The main contribution of this work is to put forward a strategy that addresses the main 
obstacles identified above and results in an algorithm which, under mild assumptions, can 
be rigorously proven to provide for any target accuracy e an approximate solution of near- 
minimal rank and representation complexity of the involved tensor factors. Specifically, (i) 



it is based on stable tensor formats relying on optimal subspaces; (ii) successive solution 
updates involve a combined refinement of ranks and factor discretizations; (iii) (near- 
)optimality is achieved, thanks to (i), through accompanying suitable subspace correction 
and coarsening schemes. 

The following comments on the main ingredients are to provide some orientation. 
A first essential step is to choose a universal basis for functions of a single variable in 
Hi. Here, we focus on wavelet bases, but other systems like the trigonometric system 
for periodic problems are conceivable as well. As soon as functions of a single variable, 
especially the factors in our rank-one tensors, are expanded in such a basis, the whole 
problem of approximating u reduces to approximating its infinite coefficient tensor u 
induced by the expansion 

u= y~] u v ^ v , V u := ip ui <g) •••®Vv d , u = (u u ) ueV d, 

uev d 

see below. The original operator equation Q is then equivalent to an infinite system 

Au = f, where A = ({AV V , ^»^, eVd , f = «/, **))„&*■ (7) 

For standard types of Sobolev spaces V it is well understood how to rescale the tensor 
product basis {^u} v £y d m suc h a way that it becomes a Riesz basis for V. This, in turn, 
together with the fact that kv^-v'(A) '■= \\A\\v-yv'\\^ 1 \\v'-^v is finite, allows one to show 
that k^ 2 _^ 2 (A) is finite. Hence one can find a positive ui such that ||I — uiA\\^ 2 ^.£ 2 < p < 1, 
i.e., the operator I — uA is a contraction so that the iteration 

u w :=u fc + w(f-Au fc ), k = 0,1,2,..., (8) 

converges for any initial guess to the solution u of (JTl). 

Of course, (p]) is only an idealization because the full coefficient sequences U& cannot 
be computed. Nevertheless, adaptive wavelet methods can be viewed as realizing pi) 
approximately, keeping possibly few wavelet coefficients "active" while still preserving 
enough accuracy to ensure convergence to u (see e.g. J8|[9]). 

In the present high-dimensional context this kind of adaptation is no longer feasi- 
ble. Instead, we propose here a "much more nonlinear" adaptation concept. Being able 
to keep increasingly accurate approximations on a path towards near-minimal rank ap- 
proximations with properly sparsified tensor factors relies crucially on suitable correction 
mechanisms. An important contribution of this work is to identify and analyze just such 
methods. Conceptually, they are embedded in a properly perturbed numerical realization 
of (pj of the form 

u fc+1 = C e3 ( fc )(P £2 ( fc )(u fc +w(f - Au fe ))), k = 0, 1,2, ..., (9) 

where P £i (fc), i = 1,2, C £3 (fc) are certain reduction operators and the £i(k), i = 1,2,3, are 
suitable tolerances which decrease for increasing k. 

More precisely, the purpose of P £ is to "correct" the current tensor expansion and, 
in doing so, reduce the rank subject to an accuracy tolerance e. We shall always refer 
to such a rank reduction operation as a recompression. For this operation to work as 
desired, it is essential that the employed tensor format is stable in the sense that the 
best approximation problem for any given ranks is well-posed. As explained above, this 
excludes the use of the canonical format. Instead we use the so-called hierarchical Tucker 
(HT) format, since on the one hand it inherits the stability of the Tucker format [13], as a 
classical best subspace method, while on the other hand it better ameliorates the curse of 
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dimensionality that the Tucker format may still be prone to. In ^2] we collect the relevant 
prerequisites. This draws to a large extent on known results for the finite-dimensional case, 
but requires proper formulation and extension of these notions and facts for the current 
sequence space setting. The second reduction operation C e , in turn, is a coarsening scheme 
that reduces the number of degrees of freedom used by the wavelet representations of the 
tensor factors, again subject to some accuracy constraint e. 

1.3 What is New? 

The use of rank reduction techniques in iterative schemes is in principle not new, see 
e.g. [3j[5j[6j[l9j[2lJ[23j[25] and the further references given in [l6J. To our knowledge, 
corresponding approaches can be subdivided roughly into two categories. In the first 
one, iterates are always truncated to a fixed tensor rank. This allows one to control the 
complexity of the approximation, but convergence of such iterations can be guaranteed 
only under very restrictive assumptions (e.g. concerning highly effective preconditioners) . 
In the second category, schemes achieve a desired target accuracy by instead prescribing 
an error tolerance for the rank truncations, but the corresponding ranks arising during 
the iteration are not controlled. A common feature of both groups of results is that they 
operate on a fixed discretization of the underlying continuous problems. 

In contrast, the principal novelty of the present approach can be sketched as follows. 
The first key element is to show that based on a known error bound for a given approxima- 
tion to the unknown solution, a judiciously chosen recompression produces a near-minimal 
rank approximation to the solution of the continous problem for a slightly larger accuracy 
tolerance. Moreover, the underlying projections are stable with respect to certain sparsity 
measures. As pointed out before, this reduction needs to be intertwined with a sufficiently 
accurate but possibly coarse approximation of the tensor factors. A direct coarsening of 
the full wavelet coefficient tensor would face the curse of dimensionality, and thus would be 
practically infeasible. The second critical element is therefore to introduce certain lower- 
dimensional quantities, termed tensor contractions, from which the degrees of freedom to 
be discarded in the coarsening are identified. This notion of contractions also serves to 
define suitable sparsity classes with respect to wavelet coefficients, facilitating a computa- 
tionally efficient, rigorously founded combination of tensor recompression and coefficient 
coarsening. 

These concepts culminate in the main result of this paper, which can be summarized 
in an admittedly oversimplified way as follows. 

Meta-Theorem: Whenever the solution to (f?l) has certain tensor-rank approximation 
rates and when the involved tensor factors have certain best N -term approximation rates, 
then a judicious numerical realization of the iteration M) realizes these rates. Moreover, 
up to logarithmic factors, the computational complexity is optimal. More specifically, for 
the smallest n such that the approximate solution u^ satisfies ||u^ — u||^ 2 < r, u^ has 
HT-ranks that can be bounded by a uniformly constant multiple of the smallest possible 
HT-ranks needed to realize accuracy r. 

To our knowledge this is the first result of this type, where convergence to the solution 
of the infinite- dimensional problem is guaranteed under realistic assumptions, and all 
ranks arising during the process remain proportional to the respective smallest possible 
ones. A rigorous proof of rank near optimality, using an iteration of the above type, is to be 
contrasted to approaches based a greedy approximation as employed in connection with the 
so-called PGD (proper generalized decomposition), see e.g. fl4l. The latter scheme aims 



at constructing an approximation in the (unstable) canonical format through successive 
greedy updates. This does, in principle, not seem to offer much hope for finding minimal 
or near-minimal rank approximations, as the greedy search operates far from orthonormal 
bases, and errors committed early in the iteration cannot easily be corrected. 

1.4 Layout 

The remainder of the paper is devoted to the development of the ingredients and their 
complexity analysis needed to make the statements in the above Meta-Theorem precise. 
Trying to carry out this program raises some issues which we will briefly address now, as 
they guide the subsequent developments. 

After collecting in ^2] some preliminaries, £J3] is devoted to a pivotal element of our 
approach, namely the development and analysis of suitable recompression and coarsening 
schemes that yield an approximation in the HT-format that is, for a given target accuracy, 
of near-minimal rank with possibly sparse tensor factors (in a sense to be made precise 
later) . 

Of course, one can hope that the solution of Q is particularly tensor sparse in the 
sense that relatively low HT-ranks already provide high accuracy if the data / are tensor 
sparse, and if the operator A (resp. A) is tensor sparse in the sense that its application 
does not increase ranks too drastically. Suitable models of operator classes that allow us to 
properly weigh tensor sparsity and wavelet expansion sparsity are introduced and analyzed 
in £|4j The approximate application of such operators with certified output accuracy builds 
on the findings in ^3} 

Finally, in ^5] we formulate an adaptive iterative algorithm and analyze its complexity. 
Starting from the coarsest possible approximation u° = 0, approximations in the tensor 
format are built successively, where the error tolerances in the iterative scheme are updated 
for each step in such a way that two goals are achieved. On the one hand, the tolerances 
are sufficiently stringent to guarantee the convergence of the iteration up to any desired 
target accuracy. On the other hand, we ensure that at each stage of the iteration, the 
approximations remain sufficiently coarse to realize the Meta-Theorem formulated above. 
Here we specify concrete tensor approximability assumptions on u, f and A that allow us 
to make its statement precise. 

2 Preliminaries 

In this section we set the notation and collect the relevant ingredients for stable tensor 
formats in the infinite-dimensional setting. In the remainder of this work, we shall use for 
simplicity the abbreviation ||-|| := ||-||^ 2! with the ^2-space on the appropriate index set. 

Our basic assumption is that we have a Riesz basis {^u} u ev d f° r V-> where V is a 
countable index set. In other words, we require that the index set has Cartesian prod- 
uct structure. Therefore any u G V can be identified with its basis coefficient sequence 
u := ((u, ^ u )) ue -yd G ^2(V d ) with uniformly equivalent norms. Thus, d will in general 
correspond to the spatial dimension of the domain of functions under consideration. In 
addition it can be important to reserve the option of grouping some of the variables in a 
possibly smaller number m < d of portions of variables, i.e., m G IN and d = d\ + . . . + d m 
for di G M. 

A canonical point of departure for the construction of {^ u } is a collection of Riesz 
bases for each component Hilbert space Hi (see ([6])), which we denote by {Vvvi/gv H *- 
To fit in the above context, we may assume without loss of generality that all V ' are 



identical, denoted by V. The precise structure of V is irrelevant at this point; however, 
in the case that the ipf?* are wavelets, each v = (j, k) encodes a dyadic level j = \u\ and a 
spatial index k = k{y). This latter case is of particular interest, since for instance when 

V is a Sobolev space, a simple rescaling of V^ 1 ® • • • ® ip^f yields a Riesz basis {*&„} for 

V C H as well. 

A simple scenario would be V = H = L2QO, 1] ), which is the situation considered in 
our numerical illustration in £|6j A second example are elliptic diffusion equations with 
stochastic coefficients. In this case, V = Hj(0)<g)L 2 ([-l, I] 00 ), and H = L 2 (tt x [-1, 1]°°). 
Here a typical choice of bases for L2Q— 1,1]°°) are tensor products of polynomials on 
[—1,1], while one can take a wavelet basis for Hg(fi), obtained by rescaling a standard L2 
basis. A third representative scenario concerns diffusion equations on high-dimensional 
product domains Q, d . Here, for instance, V = H l (Q d ) and H = Li2(fl d ). We shall comment 
on some additional difficulties that arise in the application of operators in this case in 
Remark [i~5l 

We now regard u as a tensor of order m on V = V * x • • • x V m . Rather than looking 
for approximations or representations in the canonical format 

r 
u = ^a fc UJL 1) ®---®UJ: d) , 

k=l 

we consider tensor representations of the form 

n r m 

-«*l,...,fc m V- 



"^•■■E^rf®-^;')' ( io ) 



Here the order-m tensor a = (ak 1 ,...,k rn )i<k % <r i :i=i,...,m is referred to as core tensor. The 
matrix IjW = (U^ fc .) i/ . eVd . 1<fe <r . with column vectors U^ G ^ 2 (V dl ), fc = 1, . . . ,n, is 
called the i-th mode frame, where we admit r% = 00, i = 1, . . . , m. When writing sometimes 



r(*h. _. oH-v^^v, +v, Q tt( j ) 



for convenience (U^, )kew> although the U^. may be specified through (10) only for k < n, 

(i) 

it will always be understood to mean Ui = 0, for k > T{. 



Note that in a representation of the form (10), by modifying a accordingly, one can 



always orthogonalize the columns of U^ so as to obtain 

{vf,vf ) ) = Si a , i = l,...,m. (11) 

We shall refer to XJ^' with the latter property as orthonormal mode frames. 

It will be convenient to introduce the following shorthand notation: we denote by U 



the system of orthonormal mode frames (U^) ._, so that (10) can be rewritten as 

m 

U k := (g) U^ } = Uf 0.-8UJ 1 , k = (fc, . . . , kr*) € W' 



If a is represented directly by its entries, (10) corresponds to the so-called Tucker 
format |33,34| or subspace representation. The hierarchical Tucker format |20| , as well as 
the special case of the tensor train format [301 , correspond to representations in the form 



( 10 ) as well, but use a further structured representation for the core tensor a. 



In this work, for multiindices in INq, t £ IN, we shall use the notational convention 
k = (fei,.. . ,k t ), n = (m,... ,nt), r = (n,. . . ,r t ), and so forth. 

2.1 Tucker format 

It is instructive to consider first the simpler case of the Tucker format in more detail. 



2.1.1 Some Prerequisites 

Clearly, any finitely supported u £ ^(V d ) can be represented in the form (10) for some 



r = (fj) £ M™. As mentioned before, for general u £ ^(V ), the sum in (10) may 



be infinite. Given a system of mode frames U, representing u in (10), we define the 
corresponding rank vector rank(u) £ (Mo U {oo}) m by its entries 

rankj(u) := dimspanjUJ, : k £ M} , i = 1, . . . , m . (12) 

It is not hard to see that this rank vector is independent of the specific mode frame system 
U. It is referred to as the multilinear rank of u with respect to U. For such rank vectors 
r £ (Mo U {oo}) m , we introduce the notation 

|r|oo := i 

j=l,...,m 



We can then define the set of those sequences u possessing a representation ( 10 ) of a given 
multilinear rank r £ (Mo U {oo}) m by 

T(r) := {u £ £ 2 (V d ) : rank^u) <r it i = l,...,m}. (13) 

The actual computational complexity will eventually depend on the "support" of the 
i-th mode frame given by 

oo 

fc=l 
Recall that if #supp.j(u) < oo, necessarily also rankj(u) < oo. 



sup Pi (u):= QsuppUJj?. (14) 



The following result, which can be found e.g. in (13 , 18 35 , ensures the existence of 
best approximations in T(r) also for infinite ranks. 

Theorem 1. Let u £ ^(V ') and < r, < rankj(u), then there exists v £ T(r) such that 

||u — v|| = min ||u — w||. 

rank(w)<r 



To simplify notation for the sums in (10), we define for r £ Mq 



k m ._ / X^Jl,...,^} ifminr>0, 
m[ j '~\ ifminr = 0. 

Moreover, for any vector x = (jCi)i=i,...,m and for i £ {1, . . . ,m}, we employ the notation 

Xj . — \Xl, . . . , Xj_l, Xi-\-\i • • • , x m ) , 

s I — t \ ( 5 ) 

to refer to the corresponding vector with entry i deleted or entry i replaced by y, respec- 
tively. We shall also need the auxiliary quantities 



" ;; " ~ J2 a k\ P a k\ q ' a p ): =\ a pp ( 16 ) 



ki£K m _i(ri) 



derived from the core tensor, where i £ {1, . . . , m} and p,q £ {1, . . . , r^}. 
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2.1.2 Higher-Order Singular Value Decomposition 



With the notions introduced above at hand, one can formulate an analogue of the singular 
value decomposition of matrices, the higher-order singular value decomposition (HOSVD) 



241, for the Tucker tensor format (10). In the following theorem, we summarize its prop- 



erties in the more general case of infinite-dimensional sequence spaces, where the singular 
value decomposition is replaced by the spectral theorem for compact operators. These 



facts could also be extracted from the treatment in 18, Section 8.3]. 

Theorem 2. For any u G &(V ) there exist orthonormal mode frames {U^ }keN> i 



l,...,m, with UJ.* 5 G ^ 2 (V 
£ 2 (M m ) such that 



' ^ which uniquely determine a core tensor a = (ak)keiNJ m ^ 

U = ^ a k U k • 



keIN" 



(i) 0) i h 

Moreover, with a P q and a p as in (16), the following hold: 



M 



CO 



,w 



(i) For alii G {1, . . . , m} we have (ff£ J )*ew G 4W, « nd ^ ^ °fc+i ^ ° f or alike IN 



U) 



(ii) For all i G {1, . . . , m} and all p,(j£M, we have a P q = <x 



,W|2; 



J P9- 



(hi) For eac/i r G IN™, we /lave 

a k U k 



u 



k£K m (r) 



< 



E El- 

i=l k=ri+l 



«|2\2 



< 



;;/ inf ||u — w| 

rank(w)<r 



tW 



(17) 



If in addition supp u C Ai X • • • X A TO C V for finite Aj C V * , £/ien supp Uj; C Aj and 
we /iai>e supp a C K m (r) wii/i r G IN™ satisfying ?i < $Aj /or i = 1, . . . , m. 

Proof. Let u = (u I/ ) iyeV d G ^ 2 (V rf ). For each i G {l,...,m} we consider the mode-i 
matricization of u, that is, the infinite matrix (?Vj>)„ e v d i pev d ~ d * with entries u v l , $. := u u 
for f G V , which defines a Hilbert-Schmidt operator 



rW : £ 2 (V d -*) -»■ £ 2 (V*) , (c,), evd - dl H- 



i> e V d_d i 



(^ 



By the spectral theorem, for each i there exist a nonnegative real sequence (<r„ ) n eiN; 
where o~n are the eigenvalues of ((T'^)*T' j ') , as well as orthonormal bases XJ^' = 
{Un }new f° r a subspace of £ 2 (V di ) and {V}, } n£ H for £ 2 (V d_a!i ) (again tacitly assuming 



that U$ 



r(>) 



V n = for n > dim range (Tu )), such that 



(*h 



J u 



neM 



u n \ v n ' / ^n • 



(18) 



u 



The representation ( 18 ) converges in the Hilbert-Schmidt norm, and as a consequence we 
have 

X^U^Sn)^, (19) 

neM 

with convergence in £ 2 (V d ). Furthermore, {U n } n elN m with U n := (^"jUnj is an or- 
thonormal system in £ 2 (V d ) (spanning a strict subspace of ^ 2 (V rf ) when |rank(u)|oo < oo), 
and setting a n : = (u, U n ), we have a = (a n ) G £ 2 (JN m ) and u = X^neM m a nU n - The further 
properties of the expansion can now be obtained along the lines of |24|, see also [2, 18 . □ 
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In what follows we shall denote by 

U(u) = U r (u) := {U (i) : i = 1, . . . , m, generated by HOSVD} (20) 

the particular system of orthonormal mode frames generated for a given u by HOSVD. 
It will occasionally be important to identify the specific tensor format to which a given 
system of mode frames refers, for which we use a corresponding superscript, such as in U 
for the Tucker format. 

The o4 are also referred to as mode-i singular values of u. Property (iii) in Theorem 
[2] leads to a simple procedure for truncation to lower multilinear ranks with an explicit 
error estimate in terms of the mode-z singular values. In this manner, one does not 
necessarily obtain the best approximation for prescribed rank, but the approximation is 
quasi-optimal in the sense that the error is at most by a factor yjm larger than the error 
of best approximation with the same multilinear rank. 

We shall make use of the following direct consequence of Theorems [2] 

Corollary 1. Let u G ^(V d ). For an HOSVD of u as in Theorem^ and for r with 
< fi < rankj(u) and 

m 

a= E <*(®ug), 

keK m (r) t=l 

we have 

m ranki(u) 

||u-uf <£ £ |a«| 2 (21) 

i=l k=ri+l 

and 

\\u — u\\ < \/m inf llu — wll. (22) 

weT(r) 

Remark 1. Suppose that for a finitely supported vector u on X7 d , we have a possibly 
redundant representation 

m 

u= £ a k (g)U«, 

keK m (r) t=l 

- u\ 
where the vectors U k , k = 1, . . . ,fj may be linearly dependent. Then by standard linear 

algebra procedures, we can obtain a HOSVD of u with a number of arithmetic operations 

that can be estimated by 

m 

Cm\r\Z +1 + C\r\l £ #sup Pi (u) . (23) 



where C > is an absolute constant (see, e.g., flSHj)- 

2.1.3 Projections 

We conclude this subsection with introducing canonical projections associated with any 
given system of orthonormal mode frames, since in what follows they serve as a major 
work horse. For any system of orthonormal mode frames V = (V^) ._, with r% columns 
(where r% could be infinity), we consider the "V '-rigid Tucker class 

T(V, r) := {P v v : v G ^ 2 (V d )}, (24) 
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where 

m m 

Pw:= £ (®V«,v)((g)v2 ) ), (25) 

keK m (r) i=l i=l 

and 

m 

<v,(8)v^)=E^2 fci ...vS m . 

Moreover, for any < Sj < rj, the truncated projection 

m m 

Pv,s = P^, s := £ (®vi?,')(<g>V«>), (26) 

keK m (s) t'=l t=l 

is a simple mechanism to discard some of the columns producing the best approximation 
from the reduced spaces T(V, s). We shall sometimes use the superscript T to indicate 
the reference to a specific tensor format, here the Tucker format. With these definitions, 
we can write 

m m 

p v , r v= x: (v,v k ) Vk := x: (v,®vg>®vg). 

keK m (r) k£K m (r) j=l j=l 

Note that at this point we do not require the V ? to have finite support. 

Corollary 2. For u G ^2(V d ) and r = (n)™ x G W^ wii/i < n < rankj(u), i = 1, . . . , m, 
there exists an orthonormal mode frame system U(u, r) such that 

llu — Pfri-,, r \ ull = min llu — wll, 

v ( u ' r > weT(r) 

wt/i Pjj( u r) ^ven by ( |25[ ) . 

Proof. By Theorem [TJ a best approximation of ranks r for u, 

u £ argmin{||u — v|| : rank a (u) < r a } , 

exists. Defining U(u, r) := U(u) as the orthonormal mode frame system for u, given by 
the HOSVD, we obtain the assertion. □ 

2.2 The Hierarchical Tucker Format 

The Tucker format as it stands, in general, still gives rise to an increase of degrees of 
freedom that is exponential in d. One way to mitigate the curse of dimensionality is to 



further decompose the core tensor a in (10). We now briefly formulate the relevant notions 



concerning the hierarchical Tucker Format in the present sequence space context, following 



essentially the developments in [15 , 20 , see also [18 



2.2.1 Dimension Trees 

Definition 1. Let m G IN, m > 2. A set T> m C 2^ lv "' m J is called a dimension tree if the 
following hold: 

(i) {1, . . . , m} G V m and for each i G {1, . . . , m}, we have {i} G V m . 
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(ii) Each a G V m is either a singleton or there exist unique disjoint a±, cti G V m , called 
children of a, such that a = a± U a.<i- 

Singletons {i} G V m are referred to as leaves, 

m := {l,...,m} 

as root, and elements of I(V m ) := V m \ {0 m , {1}, . . . , {m}} as interior nodes. The set 
of leaves is denoted by C(V m ), where we additionally set M{V m ) := V m \ C(V m ) = 
I(V m ) U {0 m }. When an enumeration of C(V m ) is required, we shall always assume the 
ascending order with respect to the indices, i.e., in the form {{1}, . . . , {m}}. 

It will be convenient to introduce the two functions 

Cj : V m \C{V m ) -^V m \{0 m }, Ci(a):=ai, i = 1, 2 , 

producing the "left" and "right" children of a non-leaf node a which, in view of Definition 
[TJ are well-defined up to their order, which we fix by the condition minai < mina2- 

Note that for a binary dimension tree as defined above, #P m = 2m— 1 and #N(V m ) = 

771—1. 

Remark 2. The restriction to binary trees in Definition [7] is not necessary, but leads 
to the most favorable complexity estimates for algorithms operating on the resulting ten- 
sor format. With this restriction dropped, the Tucker format ( |10[ ) can be treated in the 
same framework, with the m-ary dimension tree consisting only of root and leaves, i.e., 
{O m ,{l},...,{m}}. 

Definition 2. We shall refer to a family 

U = {U[ a) G £ 2 (V^^ d ^) : a£V m \ {0 m },k = l,...,k a }, 

with fc Q £ IU {co} for each a £ T> m \ {0 m }, as hierarchical mode frames. In addition, 
these are called orthonormal if for all a G T> m \ {0 m }, we have (\j\ a ,XJ a } = 5ij for 
i, j = 1, . . . , k a , and nested if 

spanlUJ^: k = l,...,k a } 

C span{UJf (a)) : k = 1, . . . , k Cl{a) } 8) spanlUJf (a)) : k = 1, . . . , k C2Q } . 

As for the Tucker format, we set U' 4 ^ := U"*^, and for k G IN" 1 we write 

m 

U k :=(g)U«. 
i=l 

Again to express that U is associated with the hierarchical format we sometimes write 

Let T> m be a dimension tree, let a G I(V m ) be an interior node, and j3 := {1, . . . , m}\a. 
For u G ^2(V d ), we define the Hilbert-Schmidt operator 

^) : ^(V^*) 4ft( v& £ .4) | c ^ ( E ^,kA , . ( 27 ) 






and set 



rank a (u) := dim range Tu , a G £> m \ m . 
14 



To be consistent with our previous notation for leaf nodes {i} E T> m , we use the abbre- 
viation rankj(u) := rankm(u). Again, rank a (u) can be infinite. The root element of the 
dimension tree, m = {1, . . . , m} E V m , is a special case. Here we define 

T^ m) : R^£ 2 (V d ), t^tu 

and correspondingly set 

rank 0m := 1 , U ( ° m) := u , u£° m) := , k > 1 . 

To be consistent with the Tucker format we denote by 

rank(u) := (rank a (u)) aeI > mUOm} 

the hierarchical rank vector associated with u. 

In addition, for the following discussion let {U "jj. Y k=1 a , U^ e hi^^^^), be the 
left singular vectors and o^ be the singular values of X^ . In analogy to the Tucker 
format we denote by 

U(u) = U n (u) := {{U^}^^ : a E V m , defined by J27(} (28) 



the system of orthonormal hierarchical mode frames with rank vectors rank(u) 



This in turn allows us to define, in analogy to (13), the set of tensors in hierarchical 
format with ranks r = (r a ) a( zv m \{o m } as 

U(r) := {u E £ 2 (V d ) : rank Q (u) < r a for all a E V rn \ {0 m }} . (29) 

The observation that the specific systems of hierarchical mode frames U(u) have the 
following nestedness property, including the root element, will be crucial. 

Proposition 1. For u E ^2(V rf ) and a E J\f(T> m ), the mode frames {UJl } given by the 



left singular vectors of the operators Tu defined in (27) satisfy 



(a) 



span{UJj. : k = 1, . . . , rank a (u)} C span{U^ : k = 1, . . . , rank Cl ( a )(u)} 

<g> span{U^. C2(ajj : k = 1, . .. , rank C2 ( a )(u)} , 

i.e., the family of left singular vectors of the operators T^ is comprized of orthonormal 
and nested mode frames for u. 

Proof. The statement is shown in the more generally applicable framework of minimal 
subspaces in [18] (cf. Corollary 6.18 and Theorem 6.31 there). For completeness, we give a 
self-contained proof for our particular setting. Note that, by the properties of the singular 
value decomposition, we have 

range T„ = spanjUJ, : k = 1, . . . , rank Q (u)} 

for all a E T> m . We now fix an a E J\f(T> m ). By construction of the mode frame U' a \ we 
have the singular value decomposition 

rank Q (u) 

E(«) TT (a) ~ ,,(a) 

fe=l 
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Furthermore, for each k we may perform a further singular value decomposition to obtain 

oo 

U< o) = 5>wg ® Wg, Wg 6 / a (V E ^W*), J = 1,2 , 

£=1 

where ||(o^)|| = 1, and the quantities on the right hand side still depend on a. This yields 
the representation 

kl 



By orthonormality of the involved basis vectors we have 

^ cl(Q)) [wi>vf)] = ^)w«, 



and hence 



r(*) C TT7TTT7T^T-'( C «( Q )) — o^^ /TrfeM) , 



Wg G range TA cuq;j = span{U^ w; : A; = 1, . . . , rank Ci(a) (u)} 

for i = 1,2. Consequently, any element of span-fU^ : k = 1, . . . ,rank Q (u)} can be ex- 
panded in {W$ Wg} fc ,| and thus also in {u£ l(a)) ® UJj (a)) } fel ,jfc 2 . D 

By orthonormality and nestedness, we obtain for each a G A/"(£> m ) an d fc = 1, . . . , rank Q (u) 
the expansion 

rank ci{a) (u) rank C2(ct) (u) 

<= E E <U< a \u<^ a »®<^>U^®Ij£'W ) . (30) 

fc 1= l fc 2 =l 

Defining the matrices B( a,fc ) G £2 (IN x IN) with entries 

<t ): =< U i a) '< (a)) ^< (a)) >» ( 31 ) 



( 30 ) can be rewritten as 



rank Cl(a) (u) rank C2(a) (u) 

Ui a) = E E *££<<"» «<<"», (32) 

fc 1= l fe 2 =l 

providing a decomposition into vectors U? , & = 1,2, which now involve shorter multi- 
indices supported in the children Ci(a). This decomposition can be iterated as illustrated 
by the next step. Abbreviating Cjj(a) = Cj(cj(a)), one obtains 



rank ci(a) (u) rank C2(a) (u) 

(ki,k 2 ) 



ur»= E E E *"*> 



fel=l fea=l fcj,l,fci,2fej,l,fej,2 

(M)e{i,2} 2 ' 
x B^B^J^U^ 1 ^ ® U^; l(a)) ® U^; 2(Q)) ® Ui c - 2(a)) . (33) 

Applying this recursively, any u G ^2(V d ) can be expanded in the form 

ranki(u) rank m (u) 

"= E ••■ E ^ 1 ,...^ U l 1 >---®< ) ; (34) 
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where the core tensor a has a further decomposition in terms of the matrices E$( a,fc ) for all 
non-leaf nodes a and k = 1, ... ,rank a (u). This decomposition can be given explicitly as 
follows: For each (k a ) a ^x> m , we define the auxiliary expression 



B, 



{fcajc 



TT b, . 

11 (k ci(m ,k, 

P&M(V m ) 



02 OS) 



)■ 



We now use this to give an entrywise definition of the tensor S^ m ({B ( - a ' fc '}) e ^2(lN m ), 
for each tuple of leaf node indices (kp)p e £(p m ) £ IN*^ 25 ™), as 



(^({B^ : a € Af(V m ), k = 1, . . . ,rank a (u)} 



( fc /3)/3G£(X> m ) 



£ 



B 



(fe 7 ) 7 gD 71 



(35) 



(fci)*ei(x>m) 

5=l,...,rank^(u) 



Note that the quantity on the right hand side involves a summation over all indices cor- 
responding to non-leaf nodes. Since the summands depend on all indices, this leaves 
precisely the indices corresponding to leaf nodes as free parameters, as on the left hand 
side (recall that the index for the root of the tree is restricted to the value 1). The tensor 
defined in ( |35[ ) then equals the core tensor a, which is thus represented as 

a = S Cm ({B^ fc ) : a G Af(V m ), k = 1, . . . , rank a (u)}) . (36) 

This representation is illustrated explicitly for m = 4 in Example [l] below. 

Example 1. We consider m = 4 with £> 4 = {{1, 2, 3, 4}, {1, 2}, {3, 4}, {1}, {2}, {3}, {4}}. 
For this example, we use the abbreviation r a := rank Q (u) and derive from (33) the ex- 
pansion 

ri r 2 r 3 r A r {l,2} ^{3,4} 

~~ 2^i 2-^< 2-^ 2-^ 2-^ Z^ (*{i,2}>*{3,4}) 

fcl=l /C2 = l fc 3 = l fc 4 = l fc{! 2 }=1 fc{ 3 4 }=1 

that is, for the core tensor we have the decomposition 

f{i,2} r {3(4} 

({1,2,3,4},!) n ({l,2},fe { i, 2 }) D ({3,4},fc {3i4} ) 



(3) 



U 



(4) 
fc 4 ' 



V^ Y" R ({1,2,3,4},1) R Ui, 

a klMMM - 2-^ 2^ (fc{l,2},fc{3,4}) (fci, 



A.-2) 



s 



(fc3,fc4) 



c {l,2} = lfc {3,4} 



Example 2. A tensor train (TT) representation for m = 4 as in Example [I] would cor- 
respond to £> 4 = {{1, 2, 3, 4}, {1}, {2, 3, 4}, {2}, {3, 4}, {3}, {4}}, i.e., a degenerate instead 
of a balanced binary tree. More precisely, the special case of the hierarchical Tucker for- 
mat resulting from this type of tree has also be considered under the name extended TT 



format 29 



Following 13 18], we can formulate now the analogue to Theorem [11 

Theorem 3. Let u G ^(V ), let T> m be a dimension tree, and let r = (r a ) with < r a < 
rank a (u) for a E T> m \ {0 m } ; then there exists v S 'H(r) such that 

||u — v|| = min{||u — w|| : rank a (w) < r a , « G V m \ {O m }} . 
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2.2.2 Hierarchical Singular Value Decomposition 

There exists a decomposition for the hierarchical Tucker format that can be regarded 
as a generalization of the HOSVD, which we shall refer to as hierarchical singular value 
decomposition or %SVD. The finite-dimensional versions of the following claims have been 



established in 15 . All arguments given there carry over to the infinite-dimensional case 



as in the proof of Theorem [2] 

Theorem 4. Let u G ^(V ), where d = d\ + . . . + d m , and let T> m be a dimension tree. 
Then u can be represented in the form 

u = Y, fl kU k , a = E Vm ({B^ : a G M{V m ), k = 1, . . . , rank a (u)}) 

keM m 

with a G h{y d ), B( Q ' fc ) G 4(M x M), /or a G N{V m ), k G M, and w/iere i/ie following 
hold: 

(i) (UJ?, U, (<) ) = <Jfcj /or t = 1, . . . , m and k, I G IN; 

(ii) rank 0m (u) = 1, HB^ "" 1 )]! = [|u[|, and B(°™' fc ) = for k > 1; 

(iii) (B^B^) = S k i for a G I{V m ) and k,l G M; 

(iv) /or aW i G {1, . . . , m} we have (o^, )fce!N G ^2 QN) ; and er^ > cS.i — f or a ^ k £ ^/ 

(v) /or a// i G {1, ... , m} we /iawe ap ? = lop I <5 P<? , 1 < p, a < rankj(u). 

We also have the following analogue of Corollary wl see [15] . 

Theorem 5. Let u G ^2(V d ). For an TiSVD of u as in TheoremYA and defining u by 
truncation to hierarchical ranks r = (f a ), we have 

rank Q (u) 
II -Il2 ^ V^ V^ I (a)|2 

ii u - u ii < 2^ 2^ K J | 

aSXVi k=f a +l 

and 

||u — u|| < y/2m — 3 inf{||u — v|| : v G 'H(r)} . 

Remark 3. Suppose that, in analogy to Remarkyj a compactly supported vector u on V rf 
is given in a possibly redundant hierarchical representation 

m 

u= Y, «k(g)ug, a=E Dm ({B<«*-)}) ) 

keK m (r) t=l 

where the summations in the expansion of a range over k a = 1, . . . , r a for each a, and 
where the vectors \J k , k = 1, . . . , fj, and B' Q ' ' , k = 1, . . . , f a , may 6e linearly dependent. 
Employing standard linear algebra procedures, an TiSVD of u can oe computed from such 
a representation, using a number of operations that can be estimated by 

m 

Cm ( max r a ) +C(maxfj) N #suppj(u), (37) 

aex> m \{o m } i r—f 

1=1 



where C > is a fixed constant, cf. 15, Lemma 4-9]. 
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2.2.3 Projections 



In analogy to Section 2.1.3 it will be important to associate suitable orthogonal projections 
with a given system V of nested orthonormal mode frames. To this end, r = (ra)aeT> m \{o m } 
stands always for a rank vector for the hierarchical Tucker format. Again r a = oo is 



permitted. We begin with introducing an analog to (24), with a slightly more involved 



definition. The hierarchical V -rigid tensor class of rank r is given by 

«(V, r) := {w : range li" C spanfVJ^ :fc = l ) ...,r a } 1 a62) m \ {0 m }}, (38) 



» 



where T^> is defined by (27). Clearly H(V,r)cH(r). 



In analogy to Section 2.1.3 we address next a truncation of hierarchical ranks to r < r for 
elements in H(V, r), when V is a given system of orthonormal and nested mode frames with 
ranks r. There is, however, a slight complication due to the fact that the resulting mode 
frames restricted to lower ranks r need no longer be nested. Nevertheless, an approximation 
with restricted mode frames can still be realized through an operation represented as a 
sequence of projections involving the given mode frames from V. However, the order in 
which these projections are applied now matters. 

In a way the proof of Lemma [T] already indicates how to proceed, namely restricting 
first on lower "levels" of the dimension tree. To make this precise we denote by V>^ m the 
collection of elements of V m that have distance exactly t to the root (i.e., V^ = {0 m }, 
2-m = { c i(0m) ; C2(0 m )} and so forth). Let L be the maximal integer such that V^ ^ 0. 
For £ = 1, . . . , L, let V^ := [J{i G a : a S Dj,}. Then, given V, and abbreviating 

Pv, a , ? :=E<vi"V>vi Q) , 



we define 

\e{i,...,m}\vL «6^ 



with L denoting the identity operation on the i-th tensor mode. Then, as observed in 15 



the truncation operation with mode frames V restricted to ranks r can be represented as 

Pv,r : = Py,L,f ■ ■ ■ -FV,2,r -fV,l,r • (39) 

Here the order is important because the projections Pv,Q,?)Pv,^,r corresponding to a, (3 £ 
T> m with a C /3 do not necessarily commute. Therefore a different order of projections 



may in fact lead to an end result that has ranks larger than r, cf. 15 . 

Specifically, given u G ^(V d ), we can choose V = U(u) provided by the %SVD, see 



(28). Hence Pu(u),? u gives the truncation of u based on the %SVD. For this particular 
truncation an error estimate, in terms of the error of best approximation with rank r, is 
given in Theorem [5j 



Remark 4. By ( 1391 ), we have a representation ofu := Pu(u),r u i- n terms of a sequence of 
non-commuting orthogonal projections. The result u may again be represented in terms of 
the orthonormal and nested mode frames U := U(u). 

The situation simplifies if we consider the projection to a fixed nested system of mode 
frames, without a further truncation of ranks that could entail non-nestedness. 
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Lemma 1. Let V be a family of orthonormal and nested hierarchical mode frames with 
ranks r. Then there exists a linear projection Py: ^2(V d ) — > %(V, r) such that the unique 
best approximation in %(V, r) of any u G -^(V^) is given by Pyu, that is, 

llu — Pyu||= min llu — wll . 

w£W(¥,r) 

Proof. The sought projection is given by Py = -Py,i,r, since 

holds as a consequence of the nestedness property. D 

Corollary 3. For u G ^(V ) and r = (r a ) a ^x> m with < r a < rank a (u) ; there exist 
orthonormal and nested hierarchical mode frames U(u, r) such that 



111— PfTf,, /v u|| = min llu — wl 
V(u ' n we«(r)" 



with Pfj( u r \ as in Lemmaui 

Proof. By Theorem [3j a best approximation of hierarchical ranks r for u, 

u G argmin{||u — v|| : rank a (u) < r a } , 

exists. Defining U(u, r) := U(u) as the nested and orthonormal mode frames for u, given 
by the %SVD, we obtain the assertion with Lemma [T] □ 

3 Recompression and Coarsening 

As explained in £ |1.2[ iterations of the form (19J) provide updates v = u^ + cj(f — Auj) 
which differ from the unknown u by some known tolerance. However, even when using a 
"tensor-friendly" structure of the operator A or a known "tensor-sparsity" of the data f , 
the arithmetic operations leading to the update v do not give any clue as to whether the 
resulting ranks are close to minimal. Hence, one needs a mechanism that realizes a subspace 
correction leading to tensor representations with ranks at least close to minimal ones. This 
consists in deriving from the known v a near best approximation to the unknown u where 
the notion of near best in terms of ranks is made precise below. Specifically, suppose that 
v € &(V ) is an approximation of u £ ^(V ) which for some n > satisfies 

ll u - v IU(vd) <V- ( 40 ) 

We shall show next how to derive from v a near-minimal rank tensor approximation to 



u. Based on our preparations in Sections 2.1.3 2.2.3, the following developments apply to 



both formats J- G {T,H}, in fact to any format J- with associated mode frame systems 
U = U (see (20), (28)) for which one can formulate suitable projections Py,PY ? with 



analogous properties. Accordingly, 1Z = IZj?, T G {T, H} denotes the respective set of 
rank vectors TZj- := IN™, TL^i := T> m \ {0 m }. A crucial role in what follows is played by 
the following immediate consequence of Corollaries [2j [3] combined with Theorems [2j [5j 

Remark 5. Let for a given v G ^(V^) the mode frame system U(v) be either U' (v) or 
U (v). Then, for any rank vector r < rank(v), r G TZ, one has 

ll v - P U(v),r v ll < K PH v - P U(vr) v ll = K P min ll u - w ll> ( 41 ) 

' rank(w)<r 

where Kp = yfm when T = T ', and Kp = \/2m — 3 when T = H. 
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3.1 Tensor Recompression 

Given u G -^(V^), in what follows by U(u) we either mean U 7 "(u) or U^(u), see (20) 



(28) 



We introduce next two notions of "minimal ranks" r(u, 77), r(u, 77) for a given target 
accuracy 77, one for the specific mode frame system U(u) provided by either HOSVD or 
%SVD, and one for the respective best mode frame systems. 

Definition 3. For each 77 > we choose r(u, 77) G H such that 

l|u-Pu(u),r(u,jj)U|| < 7? 
with minimal |r(u, t?)]^ that is, 

r(u,7/) G argminjlrloo : r £ TZ, ||u - P u(u ) r u|| < 77} . 
Similarly, for each 77 > we choose f (u, 77) G 1Z such that 

llu-Pucu^u,^)) 11 !! < v, 

with minimal |f(u, 77)! 00, that is (see Corollary ^1 and Remark pi), 

f(u,77) G argminjlrloo : r G K, 3 w G T(r), ||u - w|| £2 ( V d) < 77} . (42) 



Recall that the projections Pu(v),r = Pu(v) r *° ^ 7 ( r ) are gi yen either by (|26j) or (39) 
when T G {T,H}, respectively. In both cases, they will be used to define computable 
coarsening operators for any given v (of finite support in V ). In fact, setting 

P V V := Pu( v ),r(v,i,) v , ( 43 ) 

we have by definition 

||v-P^v|| < 77, |rank(P^v)|oo = |r(v,77)|oo. (44) 

Lemma 2. Fix any a > 0. For any u, v, 77 satisfying ( |40[ ), i.e. ||u — v|| < 77, one has 

ll u - P*p(i+a)i7 v|| < (1 + «p(l + a))rj (45) 

7y/wZe 

|rank(P Kp ( 1+Q)r; v)^ = |r(v, « P (1 + 01)77) I,*, < |f(u, arf)^ . (46) 

Proof. Bearing Remark [Bl in mind, given u, for the projection Pf}( u f (u ari)) one nas 

ll V - P U(u,r(u,ar,)) v ll < II ( J ~ P U(u,f(u,«j)))( v ~ u ) II + ll u ~ P U(u,r(u,^)) u ll < (! + «)??• (47) 

On the other hand, we know that for any r G 1Z, 

ll v _ PlU(v) r v ll < K ~P hlf ||v-w||, 
wef(r) 



so that, by (47), for r = r(u, an) we have 

ll v - P U(v),f(u,ar?) V|| < Kp(l + a) 77, 



which confirms (46). Estimate (45) follows by triangle inequality. □ 
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Thus, appropriately coarsening v yields an approximation to u of still the same quality 
up to a fixed (dimension-dependent) constant, where the rank of this new approximation 
is bounded by a minimal rank of a best Tucker or hierarchical Tucker approximation to u 
for somewhat higher accuracy. 

Let us reinterpret this in terms of minimal ranks, i.e., for r £ 1Z, let 

o-r(v) = cr r ,j-(v) :=inf{||v-w|| : w e J"(r)}, F e {T,H} . 
We now consider corresponding approximation classes. 

Definition 4. We call a positive, strictly increasing 7 = (7(71)) _. with 7(0) = 1 and 
7(n) — > 00, as n — > 00, a growth sequence. For a given growth sequence 7, we define 

A(n) = A T (i) := {v e ^ 2 (V d ) : sup7(|r| 00 ) - r ,^(v) =: |v|^ w } 

reft 

and HvH^^) := ||v|| + Ivl^^). We call the growth sequence 7 admissible if 

p y := sup j(n)/j(n — 1) < 00 , 
new 

which corresponds to a restriction to at most exponential growth. 

In the particular case when 7(72) ~ n s for some s > 0, HvH^m-y) := ||v|| + M^m^.) 
is a quasi-norm and AfIOt) is a linear space. 

Theorem 6. Let Kp be as in Remarkvty and let a > 0. Assume that u e .4^(7) and i/iai 
v € ^(V d ) satisfies ||u — v|| < r/. Then, defining w^ := P Kp (i+ Q .) r? v, one /zas 

|rank(w^)|oo < 7 _1 (p 7 ||u||_4^( 7) /(ar/)), ||u - w^|| < (1 + « P (1 + a))r], (48) 

and 

||w^||^ (7) < C||u|U^ (7) , n > 0, (49) 

^ere C = 1+Kp(1+a) + 1. 

Proof. The second relation in d48j) has already been shown in Remark k] We also know 
from d46j) that |rank(w^)| 00 < |f(u, ar/)|oo- Thus the first relation in ( |48[ ) is clear if 
|f(u, an)|oo = 0. Assume that I f(u, ctn)!^ > 1. Then for any r' with | r' | 00 = |f(u, an)|oo — 1, 
by definition of H^r-y) we have 

|u|Ar(7) - 7 G r ' I oo)cr r ',jr(u) > 7(|r'|oo)« r / > P7 _1 7(|f(u,a77)| 00 )a77. (50) 

Also, when |r(u, a»7)|oo = 1, we have 

o- ,r(u) = ||u|| > ar] = 7(0)077 > p 7 _1 7(|f(u, an)|oo)a77 . 

Therefore 

|f(u, aT/)!^ < 7" 1 (p 7 ||u||_ 4 ^ (7) /(a?7)), 

which is the first relation in ( |48[ ). 

As for the remaining claim, we need to estimate 7(|r| 00 )cr r) jr(w 7? ). Whenever |r|oo > 
|f(u, a77)|oo we have, by (46), o- r ^(w v ) = 0. So we need to consider only |r|oo < |f(u, arj)^. 
By d45l, 



w 



v - p C(u,r) u ll < \\ w v - u|| + ||u - Pfj( u ,r) u ll < (1 + «p(1 + a))rj + o- r ,j-(u). 
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Therefore, we conclude that for |r|oo < |r(u, ar))\oo, since cr f ( uav ^jr(\i) < an while cr r jr(u) > 
an, 

/, , x / /, , s (1 + Kp(l + a))ar] ,. . . . . 

7 »K/K < 7 oo- — + tNoovW 

a 

/l + Kp(l + a) \ .. , . 
< ( ^ " + 1 J 7(k|ooK^(u) 



< 



/l + K P (l + a) V . s 

( ^ Z + lJ|uU,( 7 ). (51) 



a 

3.2 Coarsening of Mode Frames 

We now turn to a second type of operation for reducing the complexity of given coefficient 
sequences in tensor representation, an operation that coarsens mode frames by discarding 
basis indices whose contribution is negligible. We shall use the following standard notions 
for best iV-term approximations. 

Definition 5. For dsl and A C V rf , we define the restrictions 

Rav:=v X a, ve^), 
and for each iV £ Mo, the errors of best TV-term approximation 

crj\r(v) := inf ||v — Rav||. 

ACV d 
#A<iV 

The compressibility of v can again be described through approximation classes. For s > 0, 
we denote by A s (V d ) the set of v G ^(V d ) such that 

[l v IL*rv<*1 := sup ( N + l ) Sa N(v) < oo . 
"* (V > iVeiNo 

Endowed with this (quasi-)norm, A s (V d ) becomes a (quasi-)Banach space. When no 
confusion can arise, we shall suppress the index set dependence and write A s = A s (V d ). 

We deliberately restrict the discussion to polynomial decay rates here since this cor- 
responds to finite Sobolev or Besov regularity. However, with appropriate modifications, 
the subsequent considerations can be adapted also to approximation classes corresponding 
to more general growth sequences. 

3.2.1 Tensor Contractions 

Searching through a sequence u G ^(V rf ) (of finite support) would suffer from the curse of 
dimensionality. Being content with near best iV-term approximations one can get around 
this by introducing, for each given u G ^2(V rf ), the following quantities formed from 
contractions of the tensor u ® u. 

Definition 6. Let u e &(V ). For i 6 {1, . . . , m} we define, using the notation (15), 
^)(u)=(vr«(u))^ vdi :=((EK| 2 ) 1 ) 



iev d i 



2:; 



With a slight abuse of terminology, we shall refer to these tt^'(-) simply as contractions. 
Their direct computation would involve high-dimensional summations over the index sets 
X7 d ~ di . However, the following observations show how this can be avoided. This makes 
essential use of the particular orthogonality properties of the tensor formats. 

Proposition 2. Let u € ^(V d ). 

(i) We have ||u|| = ||7rW(u)||, i = 1, ... , m. 

(ii) Let A® CV 4 , then 



m i 

iu-R N v,»)u||<(j Y, I^V)! 2 )"- ( 52 ) 



L AWx-xA( m ) 



(iii) Let in addition U' 2 ' and a be mode frames and core tensor, respectively, as in Theo- 
rems^ or\A and let (a k ) be the corresponding sequences of mode-i singular values. 



Then 

vr«(u)=(j:|U«| 2 |^| 2 ) 1 , ,6V*. (53) 

k 

Proof. Property (i) is clear, and (iii) is a simple consequence of the orthogonality prop- 
erties of mode frames and core tensor stated in Theorems [2] and [4} Abbreviating ii := 
R A (i) x ... xA (m) u, property (ii) follows, in view of (i), from 

II*" II" *s* 1 1 TJ 1 1 ™ i ill TD 1 1 " 

|| u — u || S || u ~~ - t *-A( 1 )xV d 2x---xV d ™ U H + • • • ' ll u — Jri 'V d ix---xV dm - 1 xA( m ) U H 
m 

= E E I^H! 2 - D 

i=l v gv^\AW 

The following subadditivity property is an immediate consequence of the triangle in- 
equality. 

Proposition 3. Let N G IN and u n G ^2(V d ) ; n = 1, . . . ,N. Then for each i and each 
v £ V , we have 

N N 



*£ > (X>)^£'tfW 



n=l n=l 



Relation ( 53 ) allows us to realize (in practice, of course, for finite ranks rank(u) and 



finitely supported mode frames U' 1 ') best iV-term approximations of the contractions 
7P l )(u) through those of the mode frames UL . Moreover, expressing coarsening errors in 
terms of tails of contraction sequences requires finding good Cartesian index sets. To see 
how to determine them let us suppose that for each i e { 1 , . . . , m} , we have ordered 

n%{n)>n%(n)>--->n%{n)>.... 
Then we can also order these quantities jointly for all tensor modes, 

7r£&(u)>*Jg a (u)>.... (54) 

Next, retaining only the N largest from the latter total ordering and redistributing them 
to the dimension bins 

A^(u;N):={^^ :i,=i,j = l,...,N}, i = l,...,m, (55) 
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the product set 

m 

A(u;JV) := X A«(u;A) (56) 

i=i 

can be obtained at a cost that is roughly m times the analogous low-dimensional cost. By 
construction, one has 



;r>A«(u;A0<iV 



1=1 



and 



E E i^hi 2 

j=l i/eV d »\AW(u;AT) 



mm 

A 



E E i^V)i 2 }, 

i=1 i/£V <i i\A( 1 ) 



where A ranges over all product sets Xf i ^ with X^2=i $"■ < A. 
Proposition 4. For any u £ ^(V ) one /ias 



(57) 



(58) 



|U- ^(u;^)"!! 2 < E E 



7T 



CO, 



u)f 



an 



8=1 ^eV d i\A(»)(u;Af) 

d /or any A = X™ x AW with A« c V d * satisfying YJT=i # aW < ^ one ^ as 



I" - ^A(u;AT)U|| < \/m||u - % U|| . 



(59) 



(60) 



by (59) and (58), one obtains 



Proof. The bound (59) is immediate from (52). Let now A be as in the hypothesis, then 



|u-« A (u;Af)u|| 2 < E E K 4) ( U )| 2 = ll u - R A(i)xV d 2x-xV d ™ U H 2 + --- 

«=i i/gv d i\AW 



+ ||U — Ryd! 



V d l X-XV d m-l xA( m ) 



ull < mllu — R; ull 



□ 



Note that the sorting ( 54 ) used in the construction of ( 56 ) can be replaced by a quasi- 



sorting by binary binning; we shall return to this point in the proof of Remark [9j With 
the above preparations at hand we define the coarsening operator 



C u> n v := Ra( u; a/) v , v G £ 2 (y° 



(61) 



While C Uj at is computationally feasible, it is not necessarily strictly optimal. However, 
we remark that or each N £ IN, there exists A(u; N) = X» A' 4 '(u; N) such that the best 
tensor coarsening operator 



C u ,7vv := R- Mu , N) v , vel 2 (V d ), 



realizes 



|u-C Ui ivu 



mm u — w 

Ei#sup Pi (w)<AT 



(62) 



(63) 



The next observation is that the contractions are stable under the projections Pu^( u ) 
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Lemma 3. Let u G ^(V d ). For i E {1, . . . ,m}, v G V d % and any ran/: vector r GH wra'ia 
r < rank(u) componentwise, we have 

7r«(Pu(u),rU)<7T«(u), 

where U(u) either stands for U (u) or U (u) /or t/ie Tucker and hierarchical Tucker 



format, respectively, see (20), (28). 



Proof. We consider first the Tucker format. Using the orthnormality of the mode frames 
U(u), we obtain 

(vr«(P u(u) , r u)) 2 = £ (fXU) 2 . (64) 



For any fixed kj, we have 



EEW\^ = (E ^1%i J ^ °. ( 65 ) 



rC q — _1_ £ q — _1_ fvo — -L 



and combining this with (64), with the notation R := rank(u), we obtain 



By Theorem [2jh) , the right hand side equals 

This shows the assertion for the Tucker format. The proof for the hierarchical Tucker 
format follows similar lines. We treat additional summations arising in the core tensor in 



the same way as the summation over kj above, and apply the same argument as in (65) 



recursively. □ 

As a next step, we shall combine the coarsening procedure of this subsection with the 
tensor recompression considered earlier. To this end, we define N{y,n) : = minjiV: ||v — 
C v ,tv v|| < 77} as well as 

Cjj(v) := C V) jv( v;j? ) v , (66) 

in order to switch from (near-)best iV-term approximation to corresponding (near-)best 



thresholding procedures. As a consequence of (59), we have 



||v — C v ,jv v|| < kc||v — C v ,./vv||, Kq = y/m. (67) 

Our general assumption on the approximability of mode frames is that 7fW (u) G A s which, 
as mentioned before, reflects finite Sobolev or Besov regularity of the lower-dimensional 
tensor factors. 
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3.2.2 Combination of Tensor Recompression and Coarsening 

Recall that we use || • \\a», \\ • \\at(i) *° <l uan tify sparsity of wavelet expansions of mode 
frames, and low-rank approximability, respectively. The following main result of this 
section applies again to both the Tucker and the hierarchical Tucker format. It extends 
Theorem [6] in combining tensor recompression and wavelet coarsening, and shows that 
both reduction techniques combined are optimal up to uniform constants, and stable in 
the respective sparsity norms. 

Theorem 7. For a given v G ^2(V rf ), let the mode frame system U(v) be either U' (v) or 
U w (v) (see Q, (|8j),). Let u, v G h{^ d ) with u G Ar{l), n^(u) G A s for i = 1, . . . ,m, 
and || u — v|| < r\. As before let up = kq = y/m for the Tucker format, while for the 
%- Tucker format up = \j2m — 3 and kq = \Pm. Then for 



w, 



C Kc ( Kp +l)(l+a)r ? (PKp(l+a)7 7 (v)) 



we have 
as well as 



u - w^|| < (l + kp(1 + a) + kc(kp + 1)(1 + a))rj, 

llw^H^^) < CiHull^^), 



|rank(w^)| 00 < 7 1 (p~ / \\u\\ AAl) /(an)) , 
with C\ = (a _1 (l + Kp(l + a)) + 1) and 



(68) 
(69) 

(70) 



111 in 

^#supp i (w ry ) <mrr* (V||tt«(u)IU 



i=\ 



x;ik«(w,)iu s <c 2 x;ik (i) (u)iu s , 



(71) 



i=\ 



i=l 



«>?, 



tfi C 2 = 2 S (1 + 2 s ) + 2 2s (k p (1 + a) + «c(«p(l + a) + 1 + a))m max 'f 1 ' s >. 



Proof. Taking (45) in Lemma 2 and the definition (66) into account, the relation (69) 



follows from the triangle inequality. 



The statements in (70) follow from Theorem 6l Note that the additional mode frame 



coarsening considered here does not affect these estimates. 



For the proof of (69) and (71), we can proceed similary to |8l Corollary 5.2] (see 



also 7| Theorem 4.9.1]). We set w := P rep (i+ a )?7( v )- Let N G IN be minimal such that 
||u — C Uj tv"u|| < an. Then 

||W - C u ,Afw|| < ||(I - C U)A r)(u - w)|| + ||u - C Ui Aru|| 

< ||u — w|| + ||u — C Ui atu|| < (l+Kp(l + a) + a)n , 

where we have used Remark [2] to bound the first summand on the right hand side. Con- 
sequently, by (67), 



|w — Cw,iv w|| < Atc||w — Cw,atw|| < kc||w — C U) atw|| < Kc(l+Kp(l + a) + a)n . (72) 



Furthermore, keeping the definition (|62[) and the optimality (|63[) in mind, (52) yields 



m 1 

|U " C HlJ >rU|| < inf (V||7T«(U) - R Ai 7T«(U)| 12 '' " 

m m 

^£*a%/ lk W (u)-R A ^«(u)||<(iV/m)-^||vr«(u) 



U s • 



j=i 



i=l 
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Using the latter estimate and noting that the coarsening operator C K r (i +K P )(i+ a )v retains 



at most N terms by ( 72 ) , we conclude that 



JT #sup Pi (w^)<iV< v~'m(£lh® (u)IU- 

i=\ i=\ 



and hence the first statement in (71 ). Resolving the latter inequality for 77, one can rewrite 



(|69J) as 

m _ m 

(E #su PPi (w^)) S C(a)m 3 (jj^Wn® (u)\\ A .) , (73) 



|u — wJ| < 



%=\ j=l 



where C(a) := k p (1 + a) + «c(«p(1 + a) + 1 + a). Now let N = YT=i &% with JVj := 
^supp^w^). Let fij be the best iVj-term approximation to 7rW(u), then 

IK«(W,)|U. < 2 S (||U,|U, + HU, - 7T«(W„)|U.) 

< 2 s (||7r«(u)|U s + (2^)11^ - 7rW( Wf7 )||) 

< 2 S (||7T«(U)|U S + (2iV i ) S (||u i - 7T«(U)|| + ||7rW(u) - ^ W (w 



< 2 S ((1 + 2 s )||vr«(u)|U, + (2^) s ||7r«(u) - tt«(w 






where we have used that ||uj — 7P*)(u)|| < iV^ s 1 1 7r^^ (u) 1 1 _^s as well as #suppj(w^), 
^supp^vr^^w^)) < N{. Moreover, as a consequence of the Cauchy-Schwarz inequality, 
we have the componentwise estimate 

|vr«(u)-vr«(w,)|<vr«(u-w, ? ), 

which combined with ( |73| ) yields 

||7T«(U) - 7rW(w„)|| < [|tT«(u - W„)|| = ||u - W„|| 



< N~ s C(a)m s (y2h {k) (. u )\l 

k=i 

Hence we obtain 

m 

IK (i) K)IU* <r(l + 2 s )\\7r( l \u)\\ A s+2 2s C(a)m s N- s Nf(^h {k) ^)\U). 

fc=i 

Summing over i = 1, . . . , m and noting that N~ s X^i N£ < m , max { > 1 - s } ) we arrive at the 



second assertion in ( 71 ) . D 



4 Adaptive Approximation of Operators 

Whether the solution to an operator equation actually exhibits some tensor- and expansion 
sparsity is expected to depend strongly on the structure of the involved operator. The 
purpose of this section is formulate a class of operators which are "tensor-friendly" in the 
sense that their approximate application does not increase the rank too much. Making 
this precise requires some model assumptions which at this point we feel are relevant in 
that a wide range of interesting cases is covered. But of course, many possible variants 
would be conceivable as well. In that sense the main issue in the subsequent discussion 
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is to identify the essential structural mechanisms that would still work under somewhat 
different model assumptions. 

We shall approach this on two levels. First we consider operators with an exact low rank 
structure. Of course, assuming that the operator is a single tensor product of operators 
acting on functions of a smaller number of variables would be far too restrictive and also 
concern a trivial scenario, since ranks would be preserved. More interesting are sums of 
tensor products such as the m-dimensional Laplacian 

A = dl 1+ ... + d 2 Xm , 

where strictly speaking each summand d x . is a tensor product the identity operators acting 
on all but the jth variable and the second order partial derivative with respect to the jth 
variable. Hence the wavelet representation A of A in an L2-orthonormal wavelet basis has 
the form 

A = Ai ® l 2 ® • • • ® lm H h Ii ® ■ ■ ■ ® Im-l ® A m , (74) 

where Aj is the wavelet representation of d x -. There is, however, an issue concerning the 
scaling of the wavelet bases. For L2-orthonormalized wavelets A is not bounded, an issue 



to be taken up later again in Remark 15 



At a second stage it is important to cover also operators which do not have an explicit 
low-rank structure but can be approximated in a quantified manner by low-rank opera- 
tors. A typical example are potential terms, such as those arising in electronic structure 
calculations, see, e.g., [2] and the references cited there. 

4.1 Operators with Explicit Low-Rank Form 

We start with a technical observation that will be used at several points. Given operators 
B (i) = (6^-.)^-.^. : £ 2 (V di ) -)• £ 2 (y dl ), recall that their tensor product B = B^®- • -® 
j$(m) j g gi ven ^ 3 . _ fc > . . . b, m '~ so that, whenever v = v 1 ®- • -(giv 771 , v- 7 G ^2(V dj ), 
we have Bv = (B^v 1 ) 8) • • • (8) (B( m )v m ). Observing that for any v G 4(V) 

Bv = (ii (g) B^ ® • • • ® B^) ((B^ 1 ) ® I 2 <8) • • • <8) I m ) v) , (75) 

we conclude 

||Bv|| < ||B (2) ®---®B (m) || ||7r (1) ((B (1) ®I 2 ® ■■■®I m )v)|| 
More generally, one obtains by the same argument 

||Bv|| < ||BW®---®B( i - 1 >®B< i+1 >®-"®B< T, 0|| 

x ||7rW ((Ii ® I^! ® B« ® Ii+i ® • • • ® I m )v) || . (76) 



Similarly, one derives from (75) the inequality 

vr (i) (Bv)^ < ||BW®---®B( i - 1 )®B( i+1 )®---®B( m )|| 

X7r (i) ((Ii®---®I i _i®BW®I m ®---®I m )v) iy ., UieV di . (77) 
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4.1.1 Tucker Format 

We shall be concerned first with (wavelet representations of) operators A = {av,v)vv£\J d : 
^2(V a! ) — > ^(V d ) composed of tensor products of operators according to the Tucker format. 
For a given rank vector R E W m throughout this section we assume that A: ^(V ) — > 
&2(S d ) ls bounded and has the form 

m 

A = E c "(8) A ^' ( 78 ) 

nGK m (R) i=l 

where A§ : £ 2 (V d >) -»■ £ 2 (V d ') for t E {1, . . . , m} and n, E {1, . . . , #;}. 
Example 3. In particular, any operator of the form 

Ai g) I 2 8) • • • ® I m + ... + Ii ® • • • ® Im_i (X) A m 

can be written in the form (78) with R = (2, . . . , 2), A± = Ij, A2 = A, for i = 1, . . . , m, 



and core tensor 

C2,i,...,i = • • • = ci,...,!^,!,...,! = • • • = ci,...,i, 2 = 1 , c n = otherwise. 

(i) 

The A„/ are in general infinite matrices and not necessarily sparse in the strict sense. 
We shall further require, however, that they are nearly sparse as will be quantified next. 
To this end, suppose that for each A„/ we have a sequence of approximations (in the 
spectral norm) such that for a given sequence £nl,p, p E IN, of tolerances 



ll A g- A £[p]ll<€p, PeMo. (79) 

(i) 

Moreover, it will be important to apply such sparsified versions of the A n ' to vectors 
which are supported on the elements of a partition {A n , i} p ew 0I ^ di - We shall then 
consider approximations A to A of the form 



E c n (g)A«, A«:=£AW W R (80) 

neK m (R) »=1 pelNo n " lp| 



where as before Ra denotes the restriction of a given input sequence to A. The following 
lemma describes the accuracy of such approximations. 



Lemma 4. Let v E h(^ ), let A: £ 2 (V ) ->■ ^(V d ) /iaue the form (78) for some R E M m , 



w/wZe A, given by (80) satisfies (79). TTien we Ziawe 



iav - avii < ^ e e cKw ii v> ^n ■ ( §i ) 



where 



i— 1 m 



i=i i=i+i 
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Proof. The usual insertion-triangle inequality argument for estimating differences of prod- 
ucts yields, upon using ( [76] ) and the definition of the constants C - ■ , 

|| Av - Av|| < ||£(A# - A«) ® (E c n A( 2 2 ) ® • • • ® AM) v 
Hi fii 






+ 2. 2. 



A« ® • • • ® Al™- 1 ) ] ® f Al m ) - Al m )) v 



n-^-ni ^ ^ ^-n m _i 



<4 Ell^-^® 1 ®""®^!! 

?ll 



The assertion (81) follows now, using ( |79[ ), from 

||[(A«-A( l 1 i ))®I®...»I]v||<^||[(A( l 1 i )-A« [p] )R A(1) ®I®...®I]v 



<E £ £ P |lV) 7r(1) (v) 



' l i.W 

and analogous estimates for the other summands. □ 

d) 
Remark 6. The constants C~ , depending on the operator and its approximation, may 

introduce a dependence on m. In certain cases, this dependence is exponential. For in- 
stance, in the case of an operator of the form A = B®B®-..®B with ||B|| < ||B||, we 

(i) urn — 1 

obtain C ~ = ||B|| m . This constant can therefore also strongly depend on an appropriate 
scaling of the problem under consideration. However, in the case of an operator 

A = B®I®---®I + I®B®I®---®I + ... + I<8> • • • <8>I<8>B, 

we obtain instead C~ < (m — 1)||B||. The dependence on m of C~ is thus related to the 
number of factors in each summand of A that are not identity operators. 

(i) 

The partitions A„/ will later be identified for a class of matrices studied in the context 
of wavelet methods (8l[32l . 



Definition 7. Let A be a countable index set and let s* > 0. We call an operator 
B: ^2 (A) —J- ^2 (A) s* -compressible if for any < s < s* , there exist summable positive 
sequences {otj)j>o, (Pj)j>o and for each j > 0, there exists Bj with at most afl 3 nonzero 
entries per row and column, such that ||B — Bj|| < Pj2~ SJ . For a given s*-compressible 
operator B, we denote the corresponding sequences by a(B), /3(B). 

Moreover, we say that a family of operators {B(n)} n is equi-s* -compressible if all B(n) 
are s*-compressible with the same choice of sequences (ay), (/3j) and in addition, for all 
A € A the number of nonzero elements in the rows and columns of the approximations 
B(n)j can be estimated jointly for all n in the form 



#(IK*' 



€ A: (B(n)j) A ,y / V (B(b)j)a-,a + 0}) < aj 2> . 



Example 4. To give a structural example, let us assume that {V'aJasV is an orthonormal 
wavelet basis on R. As before, let |A| denote the level of the basis function ip\. For 
c, a, /3 > 0, we denote by M c ,a,p the class of infinite matrices for which 

\bx,\'\ < c2-ll A l-l A 'H CT (l + dist(suppVA, supply)) ~ P . 
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Such bounds are known to hold, for instance, for wavelet representations of the double layer 
potential operator. Again, with a suitable rescaling of the wavelets, the representations 
of other potential types, as well as elliptic partial differential operators, exhibit the same 
decay of entries. It is shown in [81 Proposition 3.4] that (when specialized to the present 
case of one-dimensional factors) any B G M c ,(r,p with a > 1/2, /3 > 1 is s*-compressible 
with s* = mm{a - 1/2, j3 - 1}. 

If B(n) G -Mc(n),a(n),i3(n) with c(n) and cr(n) -1 ,/3(n) uniformly bounded, then from 
the construction in the proof of [8~l Proposition 3.4] it can be seen that the B(n) are 
equi-s*-compressible with s* = min{inf n cr(n) — l/2,inf n /?(n) — 1}, since the same set of 
nonzero matrix entries can be used for each n. 



The key property of s* -compressible matrices in the context of adaptive methods is 
that they are not only bounded in £2 but also on the smaller approximation spaces, and 
thus preserve sparsity in a quantifiable manner. We wish to establish such concepts next 
for the tensor setting. 



(0 



To this end, assume that the components A n / in A, given by (78), are s* -compressible 



(i) 



and let A„ . be the corresponding approximations according to Definition 

the spirit of the adaptive application of an operator in wavelet coordinates (see [8 



Quite in 

for 

approximating Av for given v G ^2(V d ), the a-priori knowledge about A in terms of 
s* -compressibility is to be combined with a-posteriori information on v. In fact, given 
v G ^2(V d ), we describe now how to construct for any J G IN approximations wj to the 
sequence Av as follows. For each i and for j G IN, let A,- be the support of the best 2 3 - 



(<) 



;« 



(0 



term approximation of 7r 2 '(v) so that, in particular, Ap C Ai^. If A n ' = I, we simply 



aW 



set A n l > = I. If A n l > / I, we let A 



(') 



;(0 



A 



(0 .. 



A 
V 



(') 



and 

\A W 



P \ "V 



J ' 



p = 0, . . . , J, 

p = J + l, 
p> J + l. 



(82) 



Moreover, let 



ni,]p] 



rii,J—pi 



0, 



p = 0,...,J, 
p> J. 



w 



(83) 



formed according to ( 80 ) are 



Note that, as a simple consequence of Definition u\ the A 
bounded independently of v. 

Lemma 5. Assume that the components A„ s of A as in (78) are s* -compressible. Given 
any v G ^(V d ), Jel, let 



£ ■ 

nSK m (R) 









W 



z A M R, m 
peMo 



M 



w#/t R^(i) , An/ defined by (82), (83), respectively. Then, whenever 7r^(v) G ^4 s /or 



"i.W 



some < s < s*, f/ie finitely supported sequence wj := Ajv satisfies 



|Av- Ajvll < 2~ sJ V C^i^fmaxllA^II + 



7T 



(i) 



(v) 



U s 



(84) 



as we// as 



# suppj A jv < Ri 



a 



(0l 



(85) 
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where the sequences a, J3 are defined as the componentwise maxima of the sequences in 
DefinitionYAfor each An- , that is, 

af := maxaJA®) , $f := max/3 7 -(A«) . (86) 

with A^ r , as in (83) and Al, , := Ar L according to (82). 
By s*-compressibility, we have |IaQ - A^JI < pf 2~< J -^ =: e£> p = 0, . . . , J, 



L n;,[J+i]H ~ IK*-"*!!' H^Af' 



|Ag - A®, j, ,|| = ||A§[|, ||R. W vr«(v)|| - for p > J + 1, and therefore 



>] 



7« J\^flW 9-^-p)||-R ,,, ^(0/ 



iiav-w.ii < £E^ E^V- s(J - p) IIV) - w (v)|| 

•=Hh=i [p=o n " H 

+ ||A«||||R aW 7t«(v)||). (87) 

By the choice of the AS and the definition of ||-||.4 s , we obtain ||R.(i) tt'*'(v)|| < 2 _sp ||7rW(v)||_4s 

IPj • A [p] 



for p = 0, . . . , J + 1, which confirms (84). Furthermore 



#supp 4 A JV < i^(aS } 2 J 2° + af^" 1 ! 1 + ... + 4 i} 2°2 J ) < ^Hc^H^- 7 , (88) 



which is (85). D 



Remark 7. Whenever v is finitely supported there exists a p(v) G Mq suc/i £/m£ AS 



/or i = 1, . . . ,m, p > p(v). Hence, the right hand side of (87) can 6e computed for each 
J G INo, where the sum over p terminates for J > p(v) at p(v). Further increasing J will 
then decrease all summands on the right hand side of (|87l). Therefore, fixing any s < s* 



(close to s* ), we can find for any n > the integer J(n) defined by 

Ri J 



J(n) := argmin E E ^E^V"^ IK« * W ( v )II 

+ ||AW||||R A(0 vr»(v)||}<n). (89) 

To further examine the properties of Aj(„iv for a given finitely supported v let 

Cf := ||d0||* , Cf := (max||A«|| + 0®\\ tl ) . (90) 

Theorem 8. Under the assumptions of Lemma^ on A and any given finitely supported 
v G hiy 1 ), for any n > let 

w r? := A J(»?) V = : A r?v, (91) 



where J{n) is defined by (89). Then 



\\Av- A,v|| <tj, (92) 

m i 

#sup Pj (A,v) < 2C£ ) 22 i iT« (E C f Ci^ill^WIUO 7 • ( 93 ) 






>2s+l 



t (0 (A„v)IU' < §T37 (4°) cfcf^ll^MII^, 04) 
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for all i = 1, . . . , m, where C~ is as in Lemma\A and the constants C^ , C~ are defined 







by (90) and are independent o/v, rj, andm. Moreover, 

rankj(Av) < Ri rankj(v), i = 1, . . . , m . 



(95) 



Proof. (92) follows from (|89|). The bound (93) is an immediate consequence of (85) 



Choosing for a given finitely supported v the mode frame system U = U(v) according to 
HOSVD, (95) is clear, since with TJw and a as in Lemma 4l we obtain 



Av - zJ z2 d («iA),,(«m,y 

n£K m (R)keM m i=l 






(96) 



where d 



(ni,fcl),...,(n m ,/c m ) — Cn«k- 



Without loss of generality it suffices to prove ( 94 ) only for i = 1 , which allows us to 



i) 



temporarily simplify the notation by writing Ar p i for Ar , . Note first that for each v\ £ V , 
using Proposition [3] followed by ( 77 ) , we obtain 



Hi 



tt« (Av) < Cf Y. *$> ( *ff ® I - ® Iv) 



ni=l 
Hi 



cf £ EI-i'TKAffu' 1 



h \ 2 Y 2 

VI \ ) J 



ni=l k 



(97) 



where we have used (53) in the last step for the mode frame system U(v) from Theorem 
pl In order to bound next the terms on the right hand side of (97) let 

A ni ,[ ] := supp range A^ R A[0] , 

A„, u := I |J supp range A^. R A[£] J \ ({J A ni)[j ]j , q > 0. 



v n lt [q] : = 



j+t=q 



i<q 



By the same argument as in (88), we also obtain 

#A niiM < ||d (1) ||42 9 . 
Since for q = 1, . . . , J, and each k, we have 



(98) 



i%„,,,AM , ii<Di%.,,,Aa ,R »-<> u ™ 



e=o 



:(i)- 



(i) 



.a) 



and since by @, R^^ AV Ra m = Ra^jO^i.j-* ~ A ni q -e+l) R ^r this can be 
estimated further by 

A'J,lTT fl 'l ..- ' ' 
A-, 



IR- aWu (1) II< VfllA^- A (1) ll + IIA^ -A (1) IIIIIRa U (1) l 



£=0 

q 



< £(0&2-<-M> +^_ 1 2-^- 1 ))||Ra m U fc 



Mi 



< (2\\^\\ e ^ (£(^ + /?« _i)2- 2s( - £ - 1) HRa m U«|' 2 ^ 3 
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Hence, abbreviating j£ := $j_£ + $~_£_i, we obtain 
[|R A M 7rW(A«®I 2 ---®I m v)|| 2 



<2 2 ^|| / 3«||, 1 ^|4 1 )| 2 ^ 7 ,2- 2 ^)||R %] ui 1 )|| 2 



£=0 



2 2s+1 ||/3 (1) ||, 1 5:7.2- 2s ^||Ra [ ^ (1) (v)|| 2 



1=0 



<2 2 ^||/3( 1 )||, 1 2- 2 ^|K( 1 )(v)|| 2 4s ^ 7 ,. 



£=0 



and thus 



R A M vr( 1 )(A«®I...»Iv)||<2-^2 s + 1 ||/3( 1 )||, 1 ||vr( 1 )(v)|U s 



^i.W 



(99) 



Recall that the sets A ni u are disjoint with #A ni r„i < ||a^ 1 '||£ 1 2 9 . By definition of the 
^4 s -quasi-norm, we have 

||7r( 1 )(A( l 1 i )®I...®Iv)|U s <sup(^#A niib1 ) S ^||R A tt« (A£> ® I • • • ® Iv) || . 



<?eIN 



J<9 



i>9 



Hence from (99) we infer 



|7r«(AW®I...®Iv)|U.<(2 2 ^ 1 (2--l)- 1 ||dW||? 1 ||^ 1 )||/ 1 )||7r«(v) 



\A* 



Since by the first inequality in (97), we have 



tt«(Av)|U. <4 1} i?f ^H^W® 1 "-® 1 ^!!* 



711=1 



we arrive at (94). 



□ 



Remark 8. The estimate (94) corresponds to the worst case that the sets A n . y ccm- 
structed in the proof are disjoint for different rii. If, on the contrary, the { A^ } ni are 



equi-s* -compressible, and hence these sets are the same for all rii, we can combine (97) 



directly with (99) to obtain instead 

lk W (Av)|U. < (cfYcfcfR^iv) 



\A" 



i.e., an improvement by a factor Rf. Similarly, in this case we also obtain that by a 



modification of (88), the estimate (93) can be replaced by 



m 1 

#su PPj (Av) < CIV- (Ecf C^lk W (v)IU-)' • 

i=i 

Remark 9. If r$ := rankj(v) < oo, the number ops(Av) of arithmetic operations for 
evaluating Av as in Lemma\8i for a given HOSVD o/v, can be estimated by 



i=l «=1 j=l 

with a constant independent o/v, 77, and m. 



(100) 
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Proof. The sorting of entries of 7P*'(v) required for obtaining the index sets of best 2 J - 
term approximations in Theorem [8] can be replaced by an approximate sorting by binary 
binning, requiring only #suppj(v) operations, as suggested in (i,26|. This only leads to 
a change in the generic constants in the resulting estimates. 

Let v have the HOSVD v = ^ fc flfcU^, then, on the one hand, we need to form the core 
tensor for the result, which takes Y]aLi R% r i operations, and evaluate the approximations 
to A„/U^, for m =- 1, . . . ,Ri and ki =- 1, . . . ,r,. The number of operations for each of 



these terms can be estimated as in |8|, which leads to (100). □ 



As the first term on the right hand side of (100) shows, the Tucker format still suffers 



from the curse of dimensionality due to the complexity of the core tensors. 

4.1.2 Hierarchical Tucker Format 

For applying operators to coefficient sequences given in the hierarchical Tucker format, we 
need a representation of these operators with analogous hierarchical structure. That is, in 
the representation 

m 

A= Y, C n(g) A ^ (101) 

nGK m (R) i=l 

for the finitely supported tensor c = (c n ) G ^(IN" 1 ) we need in addition a hierarchical 
decomposition 

c = £ Cm ({C^ : a G M(V rn ), v G IN}) , (102) 



see (35), (36). Here for a G N(T> m ), we extend the definition of representation ranks in 



the representation of A to each a G V m by setting Rn\ := Ri and 

R a :=#{v: C {a > u) /0}. (103) 

In what follows, we assume max QG x> m R a < oo and i?o m = 1- According to Theorem [4j 
v G ^(V ) has a representation 

v= Y, «kU k , a = S Cm ({B( Q ' fc )}). 
kew m 



If max a6 p m rank a (v) < oo, then Av can be represented in the form (96), with d again 
admitting a hierarchical representation in terms of matrices Y)\ a ^ u < k >>) on M 2 x W 2 with 
entries 

j-j(a,0,fc))) ._ q{oc,v) g(a,fc) 

That is, as in ( |35| ), we have an explicit representation 

d = E Vm ({D( Q '^ fc ))) : a G Af(V m ), v = 1, . . . , R a , k = 1, . . . , rank a (v)}) 



in (J96J), where the indices in k G IN are replaced in the definition of S© m (-) by the indices 
(u,k) G IN 2 . 

Example 5. To give a specific example, we consider an operator of the form 

Ai <g> h ® • • • ® I m + ... + Ii <8> • • • <8> Im-i <X> A m 
in the hierarchical format with dimension tree 

V m = {0 m , {1}, {2, . . . , m}, {2}, {3, . . . , m}, . . . , {m}} 
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as in Example Kq Setting A^' 



(') 



L A (i) 



as in (|102|) with R a = 2 for a 7^ TO , and 



C^"" 1 ) 



1 

1 



1 




Aj for i = 1, . . . , m, we obtain a representation 



1 



1 



a G N{V m ) \ {0 m } . 



The estimates in Theorem [8] now directly carry over to the hierarchical Tucker format, 
where as the only modification, ( [95J ) is replaced by 

rank a (Av) < R a rank Q (v) . 

Remark 10. If for a given TiSVD ofv, r a := rank a (v) < oo, a £ N(T> m ), the number 
ops(Av) of arithmetic operations for evaluating Av as in Theorems can be estimated by 



ops(Av) < Y^ R <x r * II R o g (cc)r Cq (a) 
aeAf(D m ) q=l 



mm i 

+ r ? -^C«^(^C^^ ) ^||vrW(v)|U s )% (104) 



i=l 



i=i 



with a constant independent ofv, r\, and m. 



Comparing the first summand on the right hand side of (104) to the one in (100), we 



observe a substantial reduction in complexity regarding the dependence on m (and hence 
d). 

4.2 Low-Rank Approximations of Operators 

In many applications of interest, the involved operators do not have an explicit low-rank 
form, but there exist efficient approximations to these operators in low-rank representation. 

Such a case can be handled by replacing a given operator A by such an approximation 
and then applying the construction for operators given in low-rank form as in the previous 
subsections. 

To make this precise, we assume that for a suitable growth sequence 7a, there exist 
approximations A^v for N 6 IN" with 



sup7 A (iV)||A- Ajv| 

N 



--: M A < oo , 



(105) 



where each A^v has a representation (101) with Ri < N. Moreover, in the case of the 



hierarchical Tucker format we assume in addition that R a < N with R a as in ( 103 ) 



Moreover, we need to quantify the approximability of the A^r- We assume that all 
tensor factors arising in each A^v are s*-compressible, and that for the approximations 



Atv of Atv according to Lemma 
Theorem [8] - we have 



and Theorem 



with constants C 



(i) 



r (i) 



C A x := sup(max<^ } WmaxC^ Cf) < oo . 



<€<" 



(106) 



Under these conditions, we shall say that the approximations A^r to A are uniformly 
s* -compressible. 

Under this assumption, the estimates for ops(Av) obtained in Remarks^ and 10 carry 
over to the present setting with additional low-rank approximation of the operator. Here 



37 



for given 77 > and v, we choose N such that ||A — Ajy[| < ??/2 and Ajy such that 
||Ajv"v — Atvv|| < r//2, which in summary yields for the Tucker format 

m 

ops(Av) < ( 7A 1 (2M A /r / )) m J] rank* (v) 

i=l 

m m 1 

+ C lA ? r"(7A 1 (2M A /r ? )) 1+S "^rank i (v)(^||vrW(v)m s )\ (107) 

i=l i=l 

and for the hierarchical Tucker format 

2 
ops(Av) < (7^ 1 (2M A /r/)) 3 ^ rank a (v) JJ rank Cg(ct) (v) 

a£j\f(D m ) q=l 

m m 1 

+ C iA ? ?^^A 1 (2M A /r ? )) 1+s "^rank J (v)(^||vrW(v)|U s ) i . (108) 

i=i i=i 



Note again the reduction in complexity in the first term of (108) over (107). 



5 An Adaptive Iterative Scheme 

5.1 Formulation and Basic Convergence Properties 

We have now all prerequisites in place to formulate an adaptive method whose basic 
structure resembles the one introduced in I9J for linear operator equations Au = f , where 
f G £2 and A is bounded and elliptic on £2, that is, 

(Av,v)£ 2 > A A ||v||! , ||Av||4 < A A ||v||£ 2 

holds for fixed constants A A , A A > 0. The scheme can be regarded as a perturbation of a 
simple Richardson iteration, 

v m := vi - w(Av» - f) , (109) 

which applies to both symmetric and nonsymmetric elliptic A. In both cases, the param- 
eter id > can be chosen such that ||I — wA[| < 1. 

Based on the developments in the previous sections, we have at hand numerically real- 
izable procedures apply, rhs, recompress, and coarsen, which for finitely supported 
v and any tolerance r\ > satisfy 

||Av-apply(v; 77)|| <r), ||f - rhs(t?)|| < 77, 

|| v — recompress(v;t7)|| < 7/, ||v — COARSEN(v;?7)|| < 77. 



Specifications of the complexities of these procedures will be summarized in { 5.2 The 



adaptive scheme that we analyze in what follows is given in Algorithm 5.1 



Proposition 5. Let the step size w > in Algorithm 5.1 satisfy ||I — wA|| < p < 1. Then 



output u £ of Algorithm 5.1 satisfies ||u e — u|| < e. 



the intermediate steps u^ of Algorithm 5.1 satisfy ||u^ — u|| < 6 k 5, and in particular, the 
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Algorithm 5.1 u £ = SOLVe(A, f; e) 



oj > and p 6 (0, 1) such that III — wAII < p, 
mout 

0, ki, «2) K 3 £ (0, 1) with ki + K2 + K3 < 1, and (5 > 0. 

output u £ satisfying ||u s — u|| < e. 
1: u :=0, <J:= A^Hfll 

2: k:=0, J := min{j : ^'(1 + (w + /3)j) < «i0} 
3: while 6 k 5 > e 
4: w := u fc , j t- 
5: repeat 
6: r/j := pj +1 9 k 5 

7: rj : = APPLY(w i; \r}j) - RHs(§rfc) 

Wj + i := RECOMPRESS(wj - U)Vj\ /3f]j) 

until (j > J V AxVllij-lH + (A A V + w + /3)?? i _i < ki^ ,+1 5) 
U fc+1 := COARSEN(RECOMPRESS(w J ;K 2 6' fc+1 5);K36' A:+1 (5) 
fc <- k + 1 
end while 



9 

10 
11 
12 
13 
14 



u e := u k 



Proof. Since K\ + K2 + K 3 < 1 ; it suffices to show that for any fc, after the termination of 
the inner loop the error bound 



|Wj - u|| < KxO^S (111) 



holds. By the choice of w, we have 

||w i+1 -u|| < ||(I-wA)(w j -u)||+w||(Aw i -/)-r i || 
< p\\wj - u|| + (u> + (3)rjj , 

and recursive application of this estimate yields 

j'-i 

|| Wj - u|| < ^ ||w - u|| + (lo + (3)Y, f^'^Vi < j (1 + j(u> + /3))^5 . 



111 



111 



Thus on the one hand, if the inner loop exits with the first condition in line 10 then 
holds by definition of J . On the other hand, if the second condition is met, then 
holds because 

||wj - u|| < p||w.,-_i - u|| + (ui + /3)r]j-i 

< ^(Iki-ill + rij-i) + (w + P)vj-i < me k+1 5 . □ 

5.2 Complexity 

Quite in the spirit of adaptive wavelet methods we analyze the performance of the above 
scheme by comparing it to an "optimality benchmark" addressing the following question: 
suppose the unknown solution exhibits a certain (unknown) rate of tensor approximability 
where the involved tensors have a certain (unknown) best iV-term approximability with 
respect to their wavelet representations. Does the scheme automatically recover these 
rates? Thus, unlike the situation in wavelet analysis we are dealing here with two types 
of approximation, and the choice of corresponding rates as a benchmark model should, 
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of course, be representative for relevant application scenarios. For the present complexity 
analysis, we focus on growth sequences of subexponential or exponential type for the in- 
volved low-rank approximations, combined with an algebraic approximation rate for the 
corresponding tensor mode frames. The rationale for this choice is the following. Ap- 
proximation rates in classical methods are governed by the regularity of the approximand 
which, unless the approximand is analytic, results in algebraic rates suffering from the 
curse of dimensionality. However, functions of many variables may very well exhibit a 
high degree of tensor sparsity without being very regular in the Sobolev or Besov sense. 
Therefore, fast tensor-rates combined with polynomial rates for the compressibility of the 
mode frames mark an ideal target scenario for tensor methods, since, as will be shown, 
the curse of dimensionality can be significantly ameliorated without requiring excessive 
regularity. 

The precise formulation of our benchmark model reads as follows. 

Assumptions 1. Concerning the tensor approximability of u, A, and f, we make the 
following assumptions: 

(i) u G An(lu) with 7u(n) = e du - n u for some d u > 0, b u > 1. 



ii) A satisfies ( |105[ ) for an M A > 0, with 7a(«) = e dAnl &A where d A > 0, b A > 1. 



(hi) Furthermore, let f G Ani'yr) with 7f(n) = e dfn f , where df = min{<i u , d A } and 
h = b u + b A . 

Concerning the approximability of lower- dimensional components, we assume that for some 
s* > 0, we have the following: 

(iv) 7P 2 ' (u) G A s for i = 1, . . . , m, for any s with < s < s* . 
(v) The low-rank approximations to A are uniformly s* -compressible in the sense of 



4-2, with C A := sup r?>0 C A A < oo ; where C AA is defined as in (106) for 



Subsection 
each value of rj. 

(vi) 7P 4 ' (f ) G A s for i = 1, . . . , m, for any s with < s < s* . 

Furthermore, we assume that the number of operations required for evaluating each required 
entry in the tensor approximations of A or f is uniformly bounded. 

Note that the requirement on f in (hi) is actually very mild because the data are 
typically more tensor sparse than the solution. 

The following complexity estimates are formulated only for the more interesting case 
of the hierarchical Tucker format. Similar statements hold for the Tucker format, involv- 
ing however additional terms that depend exponentially on m, which makes this format 
suitable only for moderate values of m. 

Remark 11. Let v have finite support with finite ranks, i.e., rank a (v) < oo for a G 
T> m . Then under AssumptionsUl apply can be realized numerically such that for w^ := 
apply(v; n) we have (see TheoremU^and RemarkFfy 

1 mi 

#su PPj (w,) < Ci(d A 1 ln(M A /r ? )) (1+s " )6A (^||^)(v)|U,)^-i , (112) 

i=i 

lk W K)IU* < C A {d A hn(M A / V )) {s+1)b ^(v)\\ A s , (113) 

|rank(wj| 00 < (d^ 1 ln(M A /r ? )) 6A |rank(v)| 00 , (114) 
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and, by (fl08j), 



-lt„fA/r /„^ 3b Ai „ i / \i3 



ops( Wr ,) < (m - l)^ ln(M A /r ? ))" DA |rank(v 



loo 



1 A 

+ ™C A (<*a ln(M A /r/)) (1+s " )6A |rank(v)| 0O (V||7r«(v)|U s ) I r/-^. (115) 



i=l 



Thus, up to polylogarithmic terms, the curse of dimensionality is avoided. If in addition 
the approximations of A are equi-s* -compressible, the polylogarithmic terms in the above 
estimates improve according to Remark^ 

Remark 12. Under Assumptions [7J the routine RHS can be realized numerically such that 
for f^ := rhs(t/) we have 

#supp,(f„)<rr^|K»(f)||>, (116) 

lk w (f.)IUs<lk w (f)IU*, (117) 

|rank(f 7? )| oc < {d, 1 ln(||f |U w ( 7f) /r?)) 6f , (118) 

as well as 

m 

ops(f„) < (ro-l)|rank(f^ + |rank(f 7 ,)| 0O ^#supp i (g . (119) 



Remark 13. We take recompress as a numerical realization ofP ri as defined in (43). 
This amounts to the computation of an HOSVD orHSVD, respectively, for which we have 
the complexity bounds given in Remarks^ and^ 



Likewise, coarsen is a numerical realization ofC v as defined in (66), with the modifi- 
cation of replacing the exact sorting of the values 7fo_(0> i = l,...,m, u £ X7 di , as required 
by C v , by an approximate sorting as proposed in \4,26l, see RemarkVh This leads to an 



increase of kc by only a fixed factor; for finitely supported v, the procedure can be realized 
in practice such that kq = 2yjm, and using a number of operations bounded by 



C|rank(v)|oo ^ #supp, ; (v) 



with a fixed C > 0. Note that here we make the implicit assumption that the orthogonality 
properties required by coarsen have been enforced if necessary before the application of 
COARSEN. This can be done by an application of RECOMPRESS (-,0). 

Note that under the assumptions of Proposition [5j the iteration converges for any 
fixed (3 > 0. A call to recompress (possibly with /3 = 0, i.e., without performing an 
approximation) is in fact necessary in each inner iteration to ensure the orthogonality 
properties required by APPLY. 

The main result of this paper is the following theorem. It says that whenever the 
solution has the approximation properties specified in Assumptions [TJ then the adaptive 
scheme recovers these rates and the required computational work has optimal complexity 
up to logarithmic factors. We have made an attempt to identify the dependencies of the 
involved constants on the problem parameters as explicitly as possible. 
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Theorem 9. Let a > and let Kp, hq be as in Theorem^ Let the constants K\, K2, K3 in 
Algorithm \5.1\ be chosen as 

K\ = (1 + (l + a)(«p + K C + K P K C )) , 

K2 = (1 + a)npKi , K3 = kc( k p + 1)(1 + «) K i • 



Lei Au = f, where A, u, f satisfy Assumptions^ Then u e produced by Algorithm \5.1\ 
satisfies 

IrankCu^loo < (d~ l ^[(Oa)- 1 p lu ||u|U w(7u) e^]) 6 " , (120) 

m m 1 

]T#supp,(u £ ) < (^HvtWHIU.V,- 1 . (121) 



1=1 8=1 



as we// as 



ll u «IU«(7u) ~ ll u IUw(7u)' ( 122 ) 

m m 

Elk W (u 6 )[U.<X)||7rW(u)[U.. (123) 



i=l 



Tae multiplicative constant in (122) depends only on a and m, those in (121) and (123) 



depend only on a, m and s. For the number of required operations, we have the estimate 

m 1 

ops(u £ ) < \i n£ \J(^-^ A+ 2 bl (£ max{ || 7r (0( u )| U . j || 7r (0(f)||^ } V e -i , (124) 



i=l 

u>i£/i a multiplicative constant independent of e and ||7P 4 '(u)||_4s ; ||7r^'(f)||^s ; and with an 
algebraic explicit dependence on m and Ca- 

Remark 14. The maximum number of inner iterations J that arises in the complexity 
estimate is defined in line [#] of Algorithm \5.1\ This value depends on the freely chosen 
algorithm parameters j3 and 0, on the constants u and p that depend only on A, and on 
k,\. Thus, J depends on m: The choice of K\ in Theorem^ leads to K\ ~ m^ 1 , and hence 
J ~ logm. Note that since |lne| clnm = 77j cln l lne l ? this leads to an algebraic dependence 
of the complexity estimate on m. Furthermore, the precise dependence of the constant in 
(124) on m is also influenced by the problem parameters from Assumption Uj which may 



contain additional implicit dependencies on m. In particular, as can be seen from the 
proof, the constant has a linear dependence on C A if C^ > 1 (cf. RemarkUn). 

Proof of Theorem^ By the choice of aci, K2, K3, we can apply Lemma [7] to each Uj pro- 
duced in line [Til of Algorithm [dTTJ which yields the bounds ( pOJ ), ( plj ), ([l22J), ( pi] ) for 

the values e = 9*6, fcGl. 

It therefore remains to estimate the computational complexity of each inner loop. Note 
that recompress in line [8] does not deteriorate the approximability of the intermediates 
Wj as a consequence of Lemma [3l 

Let ek '■= 5. We already know from Theorem [7J that 

|rank(u fc )| 00 < (d u x ln[a~V 7 u ll u IU„( 7 u) £ k *]) " < l ln ^l &u » ( 12 5) 

m m 1 

^#supp,(u fc ) < (£||7r«(u)|U.)' e -- , (126) 

i=l i=\ 

m m 

£ik w (udiu-;$£ik ( V)iU', ^ 12? ) 



i=\ i=l 
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where the multiplicative constants in the last two equations depend on a, m, and s. 
Similarly, we obtain (|122[) from (70). Furthermore, by definition of the iteration, 



|rank(w i+ i)| 00 < (c^ 1 ln(2M A /r/ i )) 6A |rank(w i )| 00 + (d f 1 ln(2\f\ AnM /r ]j )) b{ 
Combining this with ( |125| ) and using 6f > b u , we obtain 

\v a nk(w j )\ 00 <\\nE k \i bA+b *. 
The definition of the iterates also yields 

#supp i (w j+ i) < #supp i (w i ) 

1 m - 1 

+ C A (d A 1 ln(2M A /r ?J )) (1+S " 1)bA (Ell- ( °(w,)IU0^7 _ 

+ lk w (f)llW s > 



«=i 



and by (113), 

lk ( V;)IU*<lk (i) K-i)IU' 

+ ^ A (d A 1 ln(2M A /r / ,_ 1 )) (1+s)6A ||vrW(w J _ 1 )|U s+W |K«(f)|U s 

Using these estimates recursively together with ( |127 ), (126), we obtain 

||7rW(w i )|U-<|liiefc| J ' (1+a)6A ma X {||7rW(u)|U.,||7rW(f)|U.} 
and 

m m i 

^#supp,(w J )<|ln efc p( 1 +^ 1 )^^ max{ || 7r W( u )| Ufl ,|| vr W(f)| Ufl} )^^. 

1=1 8=1 



The total number of operations for the calls of APPLY in an inner loop according to (115) 
is dominated by that for the calls of recompress, which can be bounded up to a constant 
by 



|rank(wj)|^ + |rank(wj)|^ ^ #supp i (w J ) 



We thus arrive at (124). □ 



Remark 15. The above results apply directly to problems posed on separable tensor product 
Hilbert spaces, for which tensor product Riesz bases are available. Note, however, that this 
is not the case for standard Sobolev spaces W(Q, d ), which are not tensor product spaces 
themselves, but, for tensor product domains fl d , can be represented as intersections of d 
tensor product spaces. 

As mentioned in the introduction, from a sufficiently regular tensor product wavelet 
basis {§! v := %l) Ul (g> • • • ® ^u d } u e\/ d °f^2{£l d ), we can obtain a Riesz basis ofW(Q d ) by a 
level-dependent rescaling of basis functions, e.g., 



|2-smaxi|i/i|^, | 



ev d ' 
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To again arrive at a problem on £2, we now rewrite the original operator equation Au = f , 
with A: W(n d ) -> (W(n d ))', in the form 

V^ ('2~ s ( maXi l^l +maXi l^l)(A^ ^> u )) (2 smaXi ' /Ji '(w \I> )) 
ti<=V d 

= 2-< m ***\ v *\(f,iB v ), veV d . 

We thus obtain a well-posed problem on ^(V rf ) for the rescaled coefficient sequence u = 
2 smax *lwl( U) ^) an d the infinite matrix A = 2- s ( max *N+ max ^l)(,4'I' M , ¥„). 

This diagonal rescaling, which in the case of finite- dimensional Galerkin approxima- 
tions corresponds to a preconditioning of A, leads to additional problems in our context: 
the sequence (2~ smaXi ' Ul ') ue ^d (as well as possible equivalent alternatives) has infinite rank 
on the full index set X7 d . As a consequence, the ranks in an approximation of Av for some 
given v will in general depend on the range of values of \vi\, . . . ,\i/d\ f or the nonzero co- 
efficients Vj, lr .. „ d . This issue and possible strategies for handling it are discussed in more 
detail in JBj. The complexity analysis of iterative schemes in the case of A involving such 
a rescaling will be treated in a separate publication. 

6 Numerical Experiments 

We choose our example to illustrate the results of the previous section numerically ac- 
cording to several criteria. In order to arrive at a valid comparison between different 
dimensions, we choose a problem on Li2([0, 1] ) that has similar properties for different 
values of d. The problem has a discontinuous right hand side and solution, which means 
that reasonable convergence rates can be achieved only by adaptive approximation. It is 



also still sufficiently simple such that all constants used in Algorithm 5.1 can be chosen 
rigorously according to the requirements of the convergence analysis. 

We set Q := [0, l] d and use tensor order m = d. As an orthonormal wavelet basis 
{Vv}i/ev of L2([0, 1]), we use Alpert multiwavelets [l] of polynomial order p £ IN. Let 

(Tv)(t):= [ vds, 
Jo 

then T is a compact operator on L2QO, 1]) with ||T|| = 2/ir. The infinite matrix represen- 
tation ^(Ti/jfj,,i/j v )^ v is s*-compressible for any s* > 0. 
For / G 1*2(0), we consider the integral equation 

d 

(l-u d ®ST)u = f (128) 



with co d = i(f ) d . Note that for B := w d (g^ =1 T and A := I - B we have ||S|| = |, and 
therefore 

a-1 _ n d\-i 



00 


00 d 


5>* = 


= E^<S> Tk 


k=0 


k=0 i=l 



(I -By 

Furthermore, A := I — B is a nonsymmetric, L2-elliptic operator with (Av,v) > |||v||? ,^ 

as well as ||^4|| < |. Since A is the representation with respect to an orthonormal basis, 
we obtain A a = \ and A a = §■ Due to the special structure of the operator, choosing the 
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iteration parameter wasw:= 1, we have ||I — wA|| < | =: p. We choose the right hand 
side as 

oo d 

/ = (l-r)^(g)r fc A, A(x) : =v / 2^X[o,iA]Cos(27r 2 (/c + l)x), (129) 

fc=0 i=l 

where r E (0, 1). This gives ||M|l 2 ([o,1]) = 1 and II/IIl 2 (Q) = ll f II = 1, and 7r«(f) G ^ for 
any s > 0. The functions /& have jump discontinuities at vr _1 , which need to be resolved 
adaptively in order to maintain the optimal approximation rate for the given wavelet basis. 
From the expansion for (I — -B) -1 , we already know that 7n 4 '(u) g J[ s for any s < p, 
for i = 1, . . . , m. We also have the explicit representation 



(1-r) £ r k u n d ®T n f k 



u 

k,n=0 i=l 

For the choice of fk under consideration, evaluating ujj(£) i=1 T n fk, we obtain u — > / as 
d — >• oo; that is, the mode singular values of the solution approach exponential decay with 
rate r for growing d. Since ||m — /||l 2 is small for any d > 3, u has similar low-rank 
approximability for all relevant d. 

Hence for our particular choice of /, the action of A^ 1 is close to the identity. It 
should be emphasized, however, that this only simplifies the interpretation of the results, 
but does not simplify the problem from a computational point of view, since our algorithm 
does not make use of this particularity. We have also chosen a problem that is completely 
symmetric with respect to all variables to simplify the tests and the comparison between 
values of d, but do not make computational use of this symmetry. 

For the further constants arising in the iteration, we choose 6 := | and @ := 1. 
For the hierarchical Tucker format, we have Kp = y/2m — 3 and kq = \/rn, and fix the 
derived constants K\,K2,n>3 as in Theorem [9] by taking a := 1. Furthermore, we have 
J = A A 1 ||f|| = 2. 

Remark 16. Since many steps of the algorithm - including the comparably expensive ap- 
proximate application of lower- dimensional operators to tensor factors and QR factoriza- 
tions of mode frames - can be done independently for each mode, an effective parallelization 
of our adaptive scheme is quite easy to achieve. 

In all following examples, we use piecewise cubic wavelets. The implementation was 
done in C++ using standard LAPACK routines for linear algebra operations. Iterations 
are stopped as soon as a required wavelet index cannot be represented as a signed 64-bit 
integer. 

We make some simplifications in counting the number of required operations: For each 
matrix-matrix product, QR factorization, and SVD, we use the standard estimates for 
the required number of multiplications (see, e.g., |18|); for the approximation of A and 
f , we count one operation per multiplication with a matrix entry and per generated right 
hand side entry, respectively (note that we thus make the simplifying assumption that 
all required wavelet coefficients can be evaluated using 0(1) operations, which could in 
principle be realized in the present example, but is not strictly satisfied in our current im- 
plementation). We thus neglect some minor contributions that do not play any asymptotic 
role, such as the number of operations required for adding two tensor representations, and 
the sorting of tensor contraction values for coarsen, which here is done by a standard 
library call for simplicity. 
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6.1 Results with Right Hand Side of Rank 1 

For comparison, we first consider a simplified version of the right hand side reduced to the 
first summand, that is, 

/ = 



27r X[o,i/7r]Cos(2vr 



i=i 



In high dimensions, the solution u coincides with / up to very small correction terms. 
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Figure 1: Computed approximate residual norms (markers) and corresponding solution 
error estimates (solid lines), for / of rank one. 

The evolution of the computed approximate residual norms and the corresponding 
estimates for the L2-deviation from the solution of the infinite-dimensional problem is 
shown in Figure [I] Here one can clearly observe the effect of the coarsening steps after 
a certain number of inner iterations. Apart from the expected increase in the number J 
of such inner iterations with dimension, the iteration shows quite similar behaviour for 



different d. In particular, in each case the resulting iterates w,- in Algorithm 5.1 have rank 



1, the residuals Vj have ranks at most 3, thus the maximum rank arising in the iteration 
is 4. 

Note that the iteration is stopped a few steps earlier with increasing dimension because 
slightly stricter error tolerances are applied in the approximation of operator and right 
hand side. This means that the technical limit for the maximum possible wavelet level is 
reached earlier. 
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Figure 2: Total operation count at the end of each inner iteration in dependence on the 
estimated error (Dd = 32, o d = 64, x d = 128), for / of rank one. The triangle shows a 
slope of \. 



46 



We see that the number of operations, shown in Figure [2j increases at a rate close to 
the approximation order 4 of our wavelet basis. What is most remarkable here, however, 
is the very mild - almost linear - dependence of the total complexity on the dimension: a 
doubling of dimension leads to only slightly more than twice the number of operations. 

6.2 Results with Right Hand Side of Unbounded Rank 



We now use the full right hand side / as in ( 129 ), which leads to a solution with unbounded 



rank, and approximately the same exponential decay of singular values as /. 
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Figure 3: Computed approximate residual norms (markers) and corresponding solution 
error estimates (solid lines), for / of unbounded rank. 

As shown in Figure [3j the computed residual estimates and the corresponding estimates 
for the solution error behave quite similarly to the previous example. In the present case, 
the computed residual norms show a less regular pattern, which is mostly due to the 
adjustment of approximation ranks for the right hand side. 
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Figure 4: Maximum ranks of iterates Wj (solid lines) and of maximum rank of all inter- 
mediates arising in the inner iteration steps (dashed lines), for / of unbounded rank. 

The ranks of the produced iterates Wj , as well as those of the intermediate quantities 
arising in the iteration (see line M of Algorithm 5.1 prior to the recompression operation), 
shows a steady but controlled increase during the iteration, as shown in Figure |4j 

Note that in this case, the number of operations, shown in Figure [5j increases visi- 
bly faster than the limiting rate corresponding to the approximation order of the lower- 
dimensional multiresolution spaces. Due to the higher tensor ranks involved, this is to 
be expected in view of our complexity estimates. The increase of complexity with the 
problem dimension, however, still remains very moderate. 
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Figure 5: Operation count at the end of each inner iteration in dependence on the estimated 
error (□ d = 32, o d = 64, x d = 128), for / of unbounded rank. The triangle shows a 
slope of j. 

7 Conclusion and Outlook 

The presented theory and examples indicate that the schemes developed in this work 
can be applied to very high-dimensional problems, with a rigorous foundation for the 
type of elliptic operator equations considered here. The results can be extended to more 
general operator equations, as long as the variational formulation, in combination with a 
suitable basis, induces a well-conditioned isomorphism on £2- However, when the operator 
represents an isomorphism between spaces that are not simple tensor products, such as 
Sobolev spaces and their duals, additional concepts are required, which will be developed 
in a subsequent publication. 
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